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Preface 


Before  there  were  computers,  there  were  algorithms.  But  now  that  there  are  com¬ 
puters,  there  are  even  more  algorithms,  and  algorithms  lie  at  the  heart  of  computing. 

This  book  provides  a  comprehensive  introduction  to  the  modern  study  of  com¬ 
puter  algorithms.  It  presents  many  algorithms  and  covers  them  in  considerable 
depth,  yet  makes  their  design  and  analysis  accessible  to  all  levels  of  readers.  We 
have  tried  to  keep  explanations  elementary  without  sacrificing  depth  of  coverage 
or  mathematical  rigor. 

Each  chapter  presents  an  algorithm,  a  design  technique,  an  application  area,  or  a 
related  topic.  Algorithms  are  described  in  English  and  in  a  pseudocode  designed  to 
be  readable  by  anyone  who  has  done  a  little  programming.  The  book  contains  244 
figures— many  with  multiple  parts— illustrating  how  the  algorithms  work.  Since 
we  emphasize  efficiency  as  a  design  criterion,  we  include  careful  analyses  of  the 
running  times  of  all  our  algorithms. 

The  text  is  intended  primarily  for  use  in  undergraduate  or  graduate  courses  in 
algorithms  or  data  structures.  Because  it  discusses  engineering  issues  in  algorithm 
design,  as  well  as  mathematical  aspects,  it  is  equally  well  suited  for  self-study  by 
technical  professionals. 

In  this,  the  third  edition,  we  have  once  again  updated  the  entire  book.  The 
changes  cover  a  broad  spectrum,  including  new  chapters,  revised  pseudocode,  and 
a  more  active  writing  style. 

To  the  teacher 

We  have  designed  this  book  to  be  both  versatile  and  complete.  You  should  find  it 
useful  for  a  variety  of  courses,  from  an  undergraduate  course  in  data  structures  up 
through  a  graduate  course  in  algorithms.  Because  we  have  provided  considerably 
more  material  than  can  fit  in  a  typical  one-term  course,  you  can  consider  this  book 
to  be  a  “buffet”  or  “smorgasbord”  from  which  you  can  pick  and  choose  the  material 
that  best  supports  the  course  you  wish  to  teach. 
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You  should  find  it  easy  to  organize  your  course  around  just  the  chapters  you 
need.  We  have  made  chapters  relatively  self-contained,  so  that  you  need  not  worry 
about  an  unexpected  and  unnecessary  dependence  of  one  chapter  on  another.  Each 
chapter  presents  the  easier  material  first  and  the  more  difficult  material  later,  with 
section  boundaries  marking  natural  stopping  points.  In  an  undergraduate  course, 
you  might  use  only  the  earlier  sections  from  a  chapter;  in  a  graduate  course,  you 
might  cover  the  entire  chapter. 

We  have  included  957  exercises  and  158  problems.  Each  section  ends  with  exer¬ 
cises,  and  each  chapter  ends  with  problems.  The  exercises  are  generally  short  ques¬ 
tions  that  test  basic  mastery  of  the  material.  Some  are  simple  self-check  thought 
exercises,  whereas  others  are  more  substantial  and  are  suitable  as  assigned  home¬ 
work.  The  problems  are  more  elaborate  case  studies  that  often  introduce  new  ma¬ 
terial;  they  often  consist  of  several  questions  that  lead  the  student  through  the  steps 
required  to  arrive  at  a  solution. 

Departing  from  our  practice  in  previous  editions  of  this  book,  we  have  made 
publicly  available  solutions  to  some,  but  by  no  means  all,  of  the  problems  and  ex¬ 
ercises.  Our  Web  site,  http://mitpress.mit.edu/algorithms/,  links  to  these  solutions. 
You  will  want  to  check  this  site  to  make  sure  that  it  does  not  contain  the  solution  to 
an  exercise  or  problem  that  you  plan  to  assign.  We  expect  the  set  of  solutions  that 
we  post  to  grow  slowly  over  time,  so  you  will  need  to  check  it  each  time  you  teach 
the  course. 

We  have  starred  (*)  the  sections  and  exercises  that  are  more  suitable  for  graduate 
students  than  for  undergraduates.  A  starred  section  is  not  necessarily  more  diffi¬ 
cult  than  an  unstarred  one,  but  it  may  require  an  understanding  of  more  advanced 
mathematics.  Likewise,  starred  exercises  may  require  an  advanced  background  or 
more  than  average  creativity. 

To  the  student 

We  hope  that  this  textbook  provides  you  with  an  enjoyable  introduction  to  the 
field  of  algorithms.  We  have  attempted  to  make  every  algorithm  accessible  and 
interesting.  To  help  you  when  you  encounter  unfamiliar  or  difficult  algorithms,  we 
describe  each  one  in  a  step-by-step  manner.  We  also  provide  careful  explanations 
of  the  mathematics  needed  to  understand  the  analysis  of  the  algorithms.  If  you 
already  have  some  familiarity  with  a  topic,  you  will  find  the  chapters  organized  so 
that  you  can  skim  introductory  sections  and  proceed  quickly  to  the  more  advanced 
material. 

This  is  a  large  book,  and  your  class  will  probably  cover  only  a  portion  of  its 
material.  We  have  tried,  however,  to  make  this  a  book  that  will  be  useful  to  you 
now  as  a  course  textbook  and  also  later  in  your  career  as  a  mathematical  desk 
reference  or  an  engineering  handbook. 
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What  are  the  prerequisites  for  reading  this  book? 

•  You  should  have  some  programming  experience.  In  particular,  you  should  un¬ 
derstand  recursive  procedures  and  simple  data  structures  such  as  arrays  and 
linked  lists. 

•  You  should  have  some  facility  with  mathematical  proofs,  and  especially  proofs 
by  mathematical  induction.  A  few  portions  of  the  book  rely  on  some  knowledge 
of  elementary  calculus.  Beyond  that,  Parts  I  and  VIII  of  this  book  teach  you  all 
the  mathematical  techniques  you  will  need. 

We  have  heard,  loud  and  clear,  the  call  to  supply  solutions  to  problems  and 
exercises.  Our  Web  site,  http://mitpress.mit.edu/algorithms/,  links  to  solutions  for 
a  few  of  the  problems  and  exercises.  Feel  free  to  check  your  solutions  against  ours. 
We  ask,  however,  that  you  do  not  send  your  solutions  to  us. 

To  the  professional 

The  wide  range  of  topics  in  this  book  makes  it  an  excellent  handbook  on  algo¬ 
rithms.  Because  each  chapter  is  relatively  self-contained,  you  can  focus  in  on  the 
topics  that  most  interest  you. 

Most  of  the  algorithms  we  discuss  have  great  practical  utility.  We  therefore 
address  implementation  concerns  and  other  engineering  issues.  We  often  provide 
practical  alternatives  to  the  few  algorithms  that  are  primarily  of  theoretical  interest. 

If  you  wish  to  implement  any  of  the  algorithms,  you  should  find  the  transla¬ 
tion  of  our  pseudocode  into  your  favorite  programming  language  to  be  a  fairly 
straightforward  task.  We  have  designed  the  pseudocode  to  present  each  algorithm 
clearly  and  succinctly.  Consequently,  we  do  not  address  error-handling  and  other 
software-engineering  issues  that  require  specific  assumptions  about  your  program¬ 
ming  environment.  We  attempt  to  present  each  algorithm  simply  and  directly  with¬ 
out  allowing  the  idiosyncrasies  of  a  particular  programming  language  to  obscure 
its  essence. 

We  understand  that  if  you  are  using  this  book  outside  of  a  course,  then  you 
might  be  unable  to  check  your  solutions  to  problems  and  exercises  against  solutions 
provided  by  an  instructor.  Our  Web  site,  http://mitpress.mit.edu/algorithms/,  links 
to  solutions  for  some  of  the  problems  and  exercises  so  that  you  can  check  your 
work.  Please  do  not  send  your  solutions  to  us. 

To  our  colleagues 

We  have  supplied  an  extensive  bibliography  and  pointers  to  the  current  literature. 
Each  chapter  ends  with  a  set  of  chapter  notes  that  give  historical  details  and  ref¬ 
erences.  The  chapter  notes  do  not  provide  a  complete  reference  to  the  whole  field 
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of  algorithms,  however.  Though  it  may  be  hard  to  believe  for  a  book  of  this  size, 
space  constraints  prevented  us  from  including  many  interesting  algorithms. 

Despite  myriad  requests  from  students  for  solutions  to  problems  and  exercises, 
we  have  chosen  as  a  matter  of  policy  not  to  supply  references  for  problems  and 
exercises,  to  remove  the  temptation  for  students  to  look  up  a  solution  rather  than  to 
find  it  themselves. 

Changes  for  the  third  edition 

What  has  changed  between  the  second  and  third  editions  of  this  book?  The  mag¬ 
nitude  of  the  changes  is  on  a  par  with  the  changes  between  the  first  and  second 
editions.  As  we  said  about  the  second-edition  changes,  depending  on  how  you 
look  at  it,  the  book  changed  either  not  much  or  quite  a  bit. 

A  quick  look  at  the  table  of  contents  shows  that  most  of  the  second-edition  chap¬ 
ters  and  sections  appear  in  the  third  edition.  We  removed  two  chapters  and  one 
section,  but  we  have  added  three  new  chapters  and  two  new  sections  apart  from 
these  new  chapters. 

We  kept  the  hybrid  organization  from  the  first  two  editions.  Rather  than  organiz¬ 
ing  chapters  by  only  problem  domains  or  according  only  to  techniques,  this  book 
has  elements  of  both.  It  contains  technique-based  chapters  on  divide-and-conquer, 
dynamic  programming,  greedy  algorithms,  amortized  analysis,  NP-Completeness, 
and  approximation  algorithms.  But  it  also  has  entire  parts  on  sorting,  on  data 
structures  for  dynamic  sets,  and  on  algorithms  for  graph  problems.  We  find  that 
although  you  need  to  know  how  to  apply  techniques  for  designing  and  analyzing  al¬ 
gorithms,  problems  seldom  announce  to  you  which  techniques  are  most  amenable 
to  solving  them. 

Here  is  a  summary  of  the  most  significant  changes  for  the  third  edition: 

•  We  added  new  chapters  on  van  Emde  Boas  trees  and  multithreaded  algorithms, 
and  we  have  broken  out  material  on  matrix  basics  into  its  own  appendix  chapter. 

•  We  revised  the  chapter  on  recurrences  to  more  broadly  cover  the  divide-and- 
conquer  technique,  and  its  first  two  sections  apply  divide-and-conquer  to  solve 
two  problems.  The  second  section  of  this  chapter  presents  Strassen’s  algorithm 
for  matrix  multiplication,  which  we  have  moved  from  the  chapter  on  matrix 
operations. 

•  We  removed  two  chapters  that  were  rarely  taught:  binomial  heaps  and  sorting 
networks.  One  key  idea  in  the  sorting  networks  chapter,  the  0-1  principle,  ap¬ 
pears  in  this  edition  within  Problem  8-7  as  the  0-1  sorting  lemma  for  compare- 
exchange  algorithms.  The  treatment  of  Fibonacci  heaps  no  longer  relies  on 
binomial  heaps  as  a  precursor. 
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•  We  revised  our  treatment  of  dynamic  programming  and  greedy  algorithms.  Dy¬ 
namic  programming  now  leads  off  with  a  more  interesting  problem,  rod  cutting, 
than  the  assembly-line  scheduling  problem  from  the  second  edition.  Further¬ 
more,  we  emphasize  memoization  a  bit  more  than  we  did  in  the  second  edition, 
and  we  introduce  the  notion  of  the  subproblem  graph  as  a  way  to  understand 
the  running  time  of  a  dynamic -programming  algorithm.  In  our  opening  exam¬ 
ple  of  greedy  algorithms,  the  activity-selection  problem,  we  get  to  the  greedy 
algorithm  more  directly  than  we  did  in  the  second  edition. 

•  The  way  we  delete  a  node  from  binary  search  trees  (which  includes  red-black 
trees)  now  guarantees  that  the  node  requested  for  deletion  is  the  node  that  is 
actually  deleted.  In  the  first  two  editions,  in  certain  cases,  some  other  node 
would  be  deleted,  with  its  contents  moving  into  the  node  passed  to  the  deletion 
procedure.  With  our  new  way  to  delete  nodes,  if  other  components  of  a  program 
maintain  pointers  to  nodes  in  the  tree,  they  will  not  mistakenly  end  up  with  stale 
pointers  to  nodes  that  have  been  deleted. 

•  The  material  on  flow  networks  now  bases  flows  entirely  on  edges.  This  ap¬ 
proach  is  more  intuitive  than  the  net  flow  used  in  the  first  two  editions. 

•  With  the  material  on  matrix  basics  and  Strassen’s  algorithm  moved  to  other 
chapters,  the  chapter  on  matrix  operations  is  smaller  than  in  the  second  edition. 

•  We  have  modified  our  treatment  of  the  Knuth-Morris-Pratt  string-matching  al¬ 
gorithm. 

•  We  corrected  several  errors.  Most  of  these  errors  were  posted  on  our  Web  site 
of  second-edition  errata,  but  a  few  were  not. 

•  Based  on  many  requests,  we  changed  the  syntax  (as  it  were)  of  our  pseudocode. 
We  now  use  “  =  ”  to  indicate  assignment  and  “==”  to  test  for  equality,  just  as  C, 
C++,  Java,  and  Python  do.  Likewise,  we  have  eliminated  the  keywords  do  and 
then  and  adopted  “//”  as  our  comment-to-end-of-line  symbol.  We  also  now  use 
dot-notation  to  indicate  object  attributes.  Our  pseudocode  remains  procedural, 
rather  than  object-oriented.  In  other  words,  rather  than  running  methods  on 
objects,  we  simply  call  procedures,  passing  objects  as  parameters. 

•  We  added  100  new  exercises  and  28  new  problems.  We  also  updated  many 
bibliography  entries  and  added  several  new  ones. 

•  Finally,  we  went  through  the  entire  book  and  rewrote  sentences,  paragraphs, 
and  sections  to  make  the  writing  clearer  and  more  active. 
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Web  site 

You  can  use  our  Web  site,  http://mitpress.mit.edu/algorithms/,  to  obtain  supple¬ 
mentary  information  and  to  communicate  with  us.  The  Web  site  links  to  a  list  of 
known  errors,  solutions  to  selected  exercises  and  problems,  and  (of  course)  a  list 
explaining  the  corny  professor  jokes,  as  well  as  other  content  that  we  might  add. 
The  Web  site  also  tells  you  how  to  report  errors  or  make  suggestions. 

How  we  produced  this  book 

Like  the  second  edition,  the  third  edition  was  produced  in  ETpX2e.  We  used  the 
Times  font  with  mathematics  typeset  using  the  MathTime  Pro  2  fonts.  We  thank 
Michael  Spivak  from  Publish  or  Perish,  Inc.,  Lance  Carnes  from  Personal  TeX, 
Inc.,  and  Tim  Tregubov  from  Dartmouth  College  for  technical  support.  As  in  the 
previous  two  editions,  we  compiled  the  index  using  Windex,  a  C  program  that  we 
wrote,  and  the  bibliography  was  produced  with  BlBTpX.  The  PDF  hies  for  this 
book  were  created  on  a  MacBook  running  OS  10.5. 

We  drew  the  illustrations  for  the  third  edition  using  MacDraw  Pro,  with  some 
of  the  mathematical  expressions  in  illustrations  laid  in  with  the  psfrag  package 
for  LTpX2e.  Unfortunately,  MacDraw  Pro  is  legacy  software,  having  not  been 
marketed  for  over  a  decade  now.  Happily,  we  still  have  a  couple  of  Macintoshes 
that  can  run  the  Classic  environment  under  OS  10.4,  and  hence  they  can  run  Mac¬ 
Draw  Pro— mostly.  Even  under  the  Classic  environment,  we  find  MacDraw  Pro  to 
be  far  easier  to  use  than  any  other  drawing  software  for  the  types  of  illustrations 
that  accompany  computer-science  text,  and  it  produces  beautiful  output.1  Who 
knows  how  long  our  pre-Intel  Macs  will  continue  to  run,  so  if  anyone  from  Apple 
is  listening:  Please  create  an  OS  X-compatible  version  of  MacDraw  Pro! 

Acknowledgments  for  the  third  edition 
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terrific  relationship  it  has  been!  We  thank  Ellen  Faran,  Bob  Prior,  Ada  Brunstein, 
and  Mary  Reilly  for  their  help  and  support. 

We  were  geographically  distributed  while  producing  the  third  edition,  working 
in  the  Dartmouth  College  Department  of  Computer  Science,  the  MIT  Computer 


1We  investigated  several  drawing  programs  that  run  under  Mac  OS  X,  but  all  had  significant  short 
comings  compared  with  MacDraw  Pro.  We  briefly  attempted  to  produce  the  illustrations  for  this 
book  with  a  different,  well  known  drawing  program.  We  found  that  it  took  at  least  five  times  as  long 
to  produce  each  illustration  as  it  took  with  MacDraw  Pro,  and  the  resulting  illustrations  did  not  look 
as  good.  Hence  the  decision  to  revert  to  MacDraw  Pro  running  on  older  Macintoshes. 
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Science  and  Artificial  Intelligence  Laboratory,  and  the  Columbia  University  De¬ 
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for  technical  copyeditors,  Julie  is  a  sure-fire,  first-ballot  inductee.  She  is  nothing 
short  of  phenomenal.  Thank  you,  thank  you,  thank  you,  Julie!  Priya  Natarajan  also 
found  some  errors  that  we  were  able  to  correct  before  this  book  went  to  press.  Any 
errors  that  remain  (and  undoubtedly,  some  do)  are  the  responsibility  of  the  authors 
(and  probably  were  inserted  after  Julie  read  the  material). 
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which  were  in  turn  influenced  by  Michael  Bender.  We  also  incorporated  ideas 
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Introduction 


This  part  will  start  you  thinking  about  designing  and  analyzing  algorithms.  It  is 
intended  to  be  a  gentle  introduction  to  how  we  specify  algorithms,  some  of  the 
design  strategies  we  will  use  throughout  this  book,  and  many  of  the  fundamental 
ideas  used  in  algorithm  analysis.  Later  parts  of  this  book  will  build  upon  this  base. 

Chapter  1  provides  an  overview  of  algorithms  and  their  place  in  modern  com¬ 
puting  systems.  This  chapter  defines  what  an  algorithm  is  and  lists  some  examples. 
It  also  makes  a  case  that  we  should  consider  algorithms  as  a  technology,  along¬ 
side  technologies  such  as  fast  hardware,  graphical  user  interfaces,  object-oriented 
systems,  and  networks. 

In  Chapter  2,  we  see  our  first  algorithms,  which  solve  the  problem  of  sorting 
a  sequence  of  n  numbers.  They  are  written  in  a  pseudocode  which,  although  not 
directly  translatable  to  any  conventional  programming  language,  conveys  the  struc¬ 
ture  of  the  algorithm  clearly  enough  that  you  should  be  able  to  implement  it  in  the 
language  of  your  choice.  The  sorting  algorithms  we  examine  are  insertion  sort, 
which  uses  an  incremental  approach,  and  merge  sort,  which  uses  a  recursive  tech¬ 
nique  known  as  “divide-and-conquer.”  Although  the  time  each  requires  increases 
with  the  value  of  n,  the  rate  of  increase  differs  between  the  two  algorithms.  We 
determine  these  running  times  in  Chapter  2,  and  we  develop  a  useful  notation  to 
express  them. 

Chapter  3  precisely  defines  this  notation,  which  we  call  asymptotic  notation.  It 
starts  by  defining  several  asymptotic  notations,  which  we  use  for  bounding  algo¬ 
rithm  running  times  from  above  and/or  below.  The  rest  of  Chapter  3  is  primarily 
a  presentation  of  mathematical  notation,  more  to  ensure  that  your  use  of  notation 
matches  that  in  this  book  than  to  teach  you  new  mathematical  concepts. 
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Chapter  4  delves  further  into  the  divide-and-conquer  method  introduced  in 
Chapter  2.  It  provides  additional  examples  of  divide-and-conquer  algorithms,  in¬ 
cluding  Strassen’s  surprising  method  for  multiplying  two  square  matrices.  Chap¬ 
ter  4  contains  methods  for  solving  recurrences,  which  are  useful  for  describing 
the  running  times  of  recursive  algorithms.  One  powerful  technique  is  the  “mas¬ 
ter  method,”  which  we  often  use  to  solve  recurrences  that  arise  from  divide-and- 
conquer  algorithms.  Although  much  of  Chapter  4  is  devoted  to  proving  the  cor¬ 
rectness  of  the  master  method,  you  may  skip  this  proof  yet  still  employ  the  master 
method. 

Chapter  5  introduces  probabilistic  analysis  and  randomized  algorithms.  We  typ¬ 
ically  use  probabilistic  analysis  to  determine  the  running  time  of  an  algorithm  in 
cases  in  which,  due  to  the  presence  of  an  inherent  probability  distribution,  the 
running  time  may  differ  on  different  inputs  of  the  same  size.  In  some  cases,  we 
assume  that  the  inputs  conform  to  a  known  probability  distribution,  so  that  we  are 
averaging  the  running  time  over  all  possible  inputs.  In  other  cases,  the  probability 
distribution  comes  not  from  the  inputs  but  from  random  choices  made  during  the 
course  of  the  algorithm.  An  algorithm  whose  behavior  is  determined  not  only  by  its 
input  but  by  the  values  produced  by  a  random-number  generator  is  a  randomized 
algorithm.  We  can  use  randomized  algorithms  to  enforce  a  probability  distribution 
on  the  inputs— thereby  ensuring  that  no  particular-  input  always  causes  poor  perfor¬ 
mance— or  even  to  bound  the  error  rate  of  algorithms  that  are  allowed  to  produce 
incorrect  results  on  a  limited  basis. 

Appendices  A-D  contain  other  mathematical  material  that  you  will  find  helpful 
as  you  read  this  book.  You  are  likely  to  have  seen  much  of  the  material  in  the 
appendix  chapters  before  having  read  this  book  (although  the  specific  definitions 
and  notational  conventions  we  use  may  differ  in  some  cases  from  what  you  have 
seen  in  the  past),  and  so  you  should  think  of  the  Appendices  as  reference  material. 
On  the  other  hand,  you  probably  have  not  already  seen  most  of  the  material  in 
Part  I.  All  the  chapters  in  Part  I  and  the  Appendices  are  written  with  a  tutorial 
flavor. 
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The  Role  of  Algorithms  in  Computing 


What  are  algorithms?  Why  is  the  study  of  algorithms  worthwhile?  What  is  the  role 
of  algorithms  relative  to  other  technologies  used  in  computers?  In  this  chapter,  we 
will  answer  these  questions. 


1.1  Algorithms 

Informally,  an  algorithm  is  any  well-defined  computational  procedure  that  takes 
some  value,  or  set  of  values,  as  input  and  produces  some  value,  or  set  of  values,  as 
output.  An  algorithm  is  thus  a  sequence  of  computational  steps  that  transform  the 
input  into  the  output. 

We  can  also  view  an  algorithm  as  a  tool  for  solving  a  well-specified  computa¬ 
tional  problem.  The  statement  of  the  problem  specifies  in  general  terms  the  desired 
input/output  relationship.  The  algorithm  describes  a  specific  computational  proce¬ 
dure  for  achieving  that  input/output  relationship. 

For  example,  we  might  need  to  sort  a  sequence  of  numbers  into  nondecreasing 
order.  This  problem  arises  frequently  in  practice  and  provides  fertile  ground  for 
introducing  many  standard  design  techniques  and  analysis  tools.  Here  is  how  we 
formally  define  the  sorting  problem  : 

Input:  A  sequence  of  n  numbers  (aq,  a2, . . . ,  an). 

Output:  A  permutation  (reordering)  {a\ .  a'2,  . ...  a'n)  of  the  input  sequence  such 
that  a\  <  a2  <  ■  ■  ■  <  a'n. 

For  example,  given  the  input  sequence  (31,  41,  59,  26,  41,  58),  a  sorting  algorithm 
returns  as  output  the  sequence  (26,  31,  41,  41,  58,  59).  Such  an  input  sequence  is 
called  an  instance  of  the  sorting  problem.  In  general,  an  instance  of  a  problem 
consists  of  the  input  (satisfying  whatever  constraints  are  imposed  in  the  problem 
statement)  needed  to  compute  a  solution  to  the  problem. 
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Because  many  programs  use  it  as  an  intermediate  step,  sorting  is  a  fundamental 
operation  in  computer  science.  As  a  result,  we  have  a  large  number  of  good  sorting 
algorithms  at  our  disposal.  Which  algorithm  is  best  for  a  given  application  depends 
on— among  other  factors— the  number  of  items  to  be  sorted,  the  extent  to  which 
the  items  are  already  somewhat  sorted,  possible  restrictions  on  the  item  values, 
the  architecture  of  the  computer,  and  the  kind  of  storage  devices  to  be  used:  main 
memory,  disks,  or  even  tapes. 

An  algorithm  is  said  to  be  correct  if,  for  every  input  instance,  it  halts  with  the 
correct  output.  We  say  that  a  correct  algorithm  solves  the  given  computational 
problem.  An  incorrect  algorithm  might  not  halt  at  all  on  some  input  instances,  or  it 
might  halt  with  an  incorrect  answer.  Contrary  to  what  you  might  expect,  incorrect 
algorithms  can  sometimes  be  useful,  if  we  can  control  their  error  rate.  We  shall  see 
an  example  of  an  algorithm  with  a  controllable  error  rate  in  Chapter  3 1  when  we 
study  algorithms  for  finding  large  prime  numbers.  Ordinarily,  however,  we  shall 
be  concerned  only  with  correct  algorithms. 

An  algorithm  can  be  specified  in  English,  as  a  computer  program,  or  even  as 
a  hardware  design.  The  only  requirement  is  that  the  specification  must  provide  a 
precise  description  of  the  computational  procedure  to  be  followed. 

What  kinds  of  problems  are  solved  by  algorithms? 

Sorting  is  by  no  means  the  only  computational  problem  for  which  algorithms  have 
been  developed.  (You  probably  suspected  as  much  when  you  saw  the  size  of  this 
book.)  Practical  applications  of  algorithms  are  ubiquitous  and  include  the  follow¬ 
ing  examples: 

*  The  Human  Genome  Project  has  made  great  progress  toward  the  goals  of  iden¬ 
tifying  all  the  100,000  genes  in  human  DNA,  determining  the  sequences  of  the 
3  billion  chemical  base  pairs  that  make  up  human  DNA,  storing  this  informa¬ 
tion  in  databases,  and  developing  tools  for  data  analysis.  Each  of  these  steps 
requires  sophisticated  algorithms.  Although  the  solutions  to  the  various  prob¬ 
lems  involved  are  beyond  the  scope  of  this  book,  many  methods  to  solve  these 
biological  problems  use  ideas  from  several  of  the  chapters  in  this  book,  thereby 
enabling  scientists  to  accomplish  tasks  while  using  resources  efficiently.  The 
savings  are  in  time,  both  human  and  machine,  and  in  money,  as  more  informa¬ 
tion  can  be  extracted  from  laboratory  techniques. 

*  The  Internet  enables  people  all  around  the  world  to  quickly  access  and  retrieve 
large  amounts  of  information.  With  the  aid  of  clever  algorithms,  sites  on  the 
Internet  are  able  to  manage  and  manipulate  this  large  volume  of  data.  Examples 
of  problems  that  make  essential  use  of  algorithms  include  finding  good  routes 
on  which  the  data  will  travel  (techniques  for  solving  such  problems  appear  in 
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Chapter  24),  and  using  a  search  engine  to  quickly  find  pages  on  which  particular 
information  resides  (related  techniques  are  in  Chapters  11  and  32). 

•  Electronic  commerce  enables  goods  and  services  to  be  negotiated  and  ex¬ 
changed  electronically,  and  it  depends  on  the  privacy  of  personal  informa¬ 
tion  such  as  credit  card  numbers,  passwords,  and  bank  statements.  The  core 
technologies  used  in  electronic  commerce  include  public-key  cryptography  and 
digital  signatures  (covered  in  Chapter  31),  which  are  based  on  numerical  algo¬ 
rithms  and  number  theory. 

•  Manufacturing  and  other  commercial  enterprises  often  need  to  allocate  scarce 
resources  in  the  most  beneficial  way.  An  oil  company  may  wish  to  know  where 
to  place  its  wells  in  order  to  maximize  its  expected  profit.  A  political  candidate 
may  want  to  determine  where  to  spend  money  buying  campaign  advertising  in 
order  to  maximize  the  chances  of  winning  an  election.  An  airline  may  wish 
to  assign  crews  to  flights  in  the  least  expensive  way  possible,  making  sure  that 
each  flight  is  covered  and  that  government  regulations  regarding  crew  schedul¬ 
ing  are  met.  An  Internet  service  provider  may  wish  to  determine  where  to  place 
additional  resources  in  order  to  serve  its  customers  more  effectively.  All  of 
these  are  examples  of  problems  that  can  be  solved  using  linear  programming, 
which  we  shall  study  in  Chapter  29. 

Although  some  of  the  details  of  these  examples  are  beyond  the  scope  of  this 
book,  we  do  give  underlying  techniques  that  apply  to  these  problems  and  problem 
areas.  We  also  show  how  to  solve  many  specific  problems,  including  the  following: 

•  We  are  given  a  road  map  on  which  the  distance  between  each  pair  of  adjacent 
intersections  is  marked,  and  we  wish  to  determine  the  shortest  route  from  one 
intersection  to  another.  The  number  of  possible  routes  can  be  huge,  even  if  we 
disallow  routes  that  cross  over  themselves.  How  do  we  choose  which  of  all 
possible  routes  is  the  shortest?  Here,  we  model  the  road  map  (which  is  itself 
a  model  of  the  actual  roads)  as  a  graph  (which  we  will  meet  in  Part  VI  and 
Appendix  B),  and  we  wish  to  find  the  shortest  path  from  one  vertex  to  another 
in  the  graph.  We  shall  see  how  to  solve  this  problem  efficiently  in  Chapter  24. 

•  We  are  given  two  ordered  sequences  of  symbols,  X  =  (xi,  x2,  . . . ,  xm)  and 
Y  =  (Ti,  yi,  •  •  • ,  Jn),  and  we  wish  to  find  a  longest  common  subsequence  of 
X  and  Y .  A  subsequence  of  X  is  just  X  with  some  (or  possibly  all  or  none)  of 
its  elements  removed.  For  example,  one  subsequence  of  (A,  B ,  C,  D,  E ,  F ,  G) 
would  be  ( B ,  C,  E,  G).  The  length  of  a  longest  common  subsequence  of  X 
and  Y  gives  one  measure  of  how  similar  these  two  sequences  are.  For  example, 
if  the  two  sequences  are  base  pairs  in  DNA  strands,  then  we  might  consider 
them  similar  if  they  have  a  long  common  subsequence.  If  X  has  in  symbols 
and  Y  has  n  symbols,  then  X  and  Y  have  2m  and  2"  possible  subsequences, 
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respectively.  Selecting  all  possible  subsequences  of  X  and  Y  and  matching 
them  up  could  take  a  prohibitively  long  time  unless  m  and  n  are  very  small. 
We  shall  see  in  Chapter  15  how  to  use  a  general  technique  known  as  dynamic 
programming  to  solve  this  problem  much  more  efficiently. 

•  We  are  given  a  mechanical  design  in  terms  of  a  library  of  parts,  where  each  part 
may  include  instances  of  other  parts,  and  we  need  to  list  the  parts  in  order  so 
that  each  part  appears  before  any  part  that  uses  it.  If  the  design  comprises  n 
parts,  then  there  are  n !  possible  orders,  where  n !  denotes  the  factorial  function. 
Because  the  factorial  function  grows  faster  than  even  an  exponential  function, 
we  cannot  feasibly  generate  each  possible  order  and  then  verify  that,  within 
that  order,  each  part  appears  before  the  parts  using  it  (unless  we  have  only  a 
few  parts).  This  problem  is  an  instance  of  topological  sorting,  and  we  shall  see 
in  Chapter  22  how  to  solve  this  problem  efficiently. 

•  We  are  given  n  points  in  the  plane,  and  we  wish  to  find  the  convex  hull  of 
these  points.  The  convex  hull  is  the  smallest  convex  polygon  containing  the 
points.  Intuitively,  we  can  think  of  each  point  as  being  represented  by  a  nail 
sticking  out  from  a  board.  The  convex  hull  would  be  represented  by  a  tight 
rubber  band  that  surrounds  all  the  nails.  Each  nail  around  which  the  rubber 
band  makes  a  turn  is  a  vertex  of  the  convex  hull.  (See  Figure  33.6  on  page  1029 
for  an  example.)  Any  of  the  2"  subsets  of  the  points  might  be  the  vertices 
of  the  convex  hull.  Knowing  which  points  are  vertices  of  the  convex  hull  is 
not  quite  enough,  either,  since  we  also  need  to  know  the  order  in  which  they 
appear.  There  are  many  choices,  therefore,  for  the  vertices  of  the  convex  hull. 
Chapter  33  gives  two  good  methods  for  finding  the  convex  hull. 

These  lists  are  far  from  exhaustive  (as  you  again  have  probably  surmised  from 
this  book’s  heft),  but  exhibit  two  characteristics  that  are  common  to  many  interest¬ 
ing  algorithmic  problems: 

1.  They  have  many  candidate  solutions,  the  overwhelming  majority  of  which  do 
not  solve  the  problem  at  hand.  Finding  one  that  does,  or  one  that  is  “best,”  can 
present  quite  a  challenge. 

2.  They  have  practical  applications.  Of  the  problems  in  the  above  list,  finding  the 
shortest  path  provides  the  easiest  examples.  A  transportation  firm,  such  as  a 
trucking  or  railroad  company,  has  a  financial  interest  in  finding  shortest  paths 
through  a  road  or  rail  network  because  taking  shorter  paths  results  in  lower 
labor  and  fuel  costs.  Or  a  routing  node  on  the  Internet  may  need  to  find  the 
shortest  path  through  the  network  in  order  to  route  a  message  quickly.  Or  a 
person  wishing  to  drive  from  New  York  to  Boston  may  want  to  find  driving 
directions  from  an  appropriate  Web  site,  or  she  may  use  her  GPS  while  driving. 
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Not  every  problem  solved  by  algorithms  has  an  easily  identified  set  of  candidate 
solutions.  For  example,  suppose  we  are  given  a  set  of  numerical  values  represent¬ 
ing  samples  of  a  signal,  and  we  want  to  compute  the  discrete  Fourier  transform  of 
these  samples.  The  discrete  Fourier  transform  converts  the  time  domain  to  the  fre¬ 
quency  domain,  producing  a  set  of  numerical  coefficients,  so  that  we  can  determine 
the  strength  of  various  frequencies  in  the  sampled  signal.  In  addition  to  lying  at 
the  heart  of  signal  processing,  discrete  Fourier  transforms  have  applications  in  data 
compression  and  multiplying  large  polynomials  and  integers.  Chapter  30  gives 
an  efficient  algorithm,  the  fast  Fourier  transform  (commonly  called  the  FFT),  for 
this  problem,  and  the  chapter  also  sketches  out  the  design  of  a  hardware  circuit  to 
compute  the  FFT. 

Data  structures 

This  book  also  contains  several  data  structures.  A  data  structure  is  a  way  to  store 
and  organize  data  in  order  to  facilitate  access  and  modifications.  No  single  data 
structure  works  well  for  all  purposes,  and  so  it  is  important  to  know  the  strengths 
and  limitations  of  several  of  them. 

Technique 

Although  you  can  use  this  book  as  a  “cookbook”  for  algorithms,  you  may  someday 
encounter  a  problem  for  which  you  cannot  readily  find  a  published  algorithm  (many 
of  the  exercises  and  problems  in  this  book,  for  example).  This  book  will  teach  you 
techniques  of  algorithm  design  and  analysis  so  that  you  can  develop  algorithms  on 
your  own,  show  that  they  give  the  correct  answer,  and  understand  their  efficiency. 
Different  chapters  address  different  aspects  of  algorithmic  problem  solving.  Some 
chapters  address  specific  problems,  such  as  finding  medians  and  order  statistics  in 
Chapter  9,  computing  minimum  spanning  trees  in  Chapter  23,  and  determining  a 
maximum  flow  in  a  network  in  Chapter  26.  Other  chapters  address  techniques, 
such  as  divide-and-conquer  in  Chapter  4,  dynamic  programming  in  Chapter  15, 
and  amortized  analysis  in  Chapter  17. 

Hard  problems 

Most  of  this  book  is  about  efficient  algorithms.  Our  usual  measure  of  efficiency 
is  speed,  i.e.,  how  long  an  algorithm  takes  to  produce  its  result.  There  are  some 
problems,  however,  for  which  no  efficient  solution  is  known.  Chapter  34  studies 
an  interesting  subset  of  these  problems,  which  are  known  as  NP-complete. 

Why  are  NP-complete  problems  interesting?  First,  although  no  efficient  algo¬ 
rithm  for  an  NP-complete  problem  has  ever  been  found,  nobody  has  ever  proven 
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that  an  efficient  algorithm  for  one  cannot  exist.  In  other  words,  no  one  knows 
whether  or  not  efficient  algorithms  exist  for  NP-complete  problems.  Second,  the 
set  of  NP-complete  problems  has  the  remarkable  property  that  if  an  efficient  algo¬ 
rithm  exists  for  any  one  of  them,  then  efficient  algorithms  exist  for  all  of  them.  This 
relationship  among  the  NP-complete  problems  makes  the  lack  of  efficient  solutions 
all  the  more  tantalizing.  Third,  several  NP-complete  problems  are  similar,  but  not 
identical,  to  problems  for  which  we  do  know  of  efficient  algorithms.  Computer 
scientists  are  intrigued  by  how  a  small  change  to  the  problem  statement  can  cause 
a  big  change  to  the  efficiency  of  the  best  known  algorithm. 

You  should  know  about  NP-complete  problems  because  some  of  them  arise  sur¬ 
prisingly  often  in  real  applications.  If  you  are  called  upon  to  produce  an  efficient 
algorithm  for  an  NP-complete  problem,  you  are  likely  to  spend  a  lot  of  time  in  a 
fruitless  search.  If  you  can  show  that  the  problem  is  NP-complete,  you  can  instead 
spend  your  time  developing  an  efficient  algorithm  that  gives  a  good,  but  not  the 
best  possible,  solution. 

As  a  concrete  example,  consider  a  delivery  company  with  a  central  depot.  Each 
day,  it  loads  up  each  delivery  truck  at  the  depot  and  sends  it  around  to  deliver  goods 
to  several  addresses.  At  the  end  of  the  day,  each  truck  must  end  up  back  at  the  depot 
so  that  it  is  ready  to  be  loaded  for  the  next  day.  To  reduce  costs,  the  company  wants 
to  select  an  order  of  delivery  stops  that  yields  the  lowest  overall  distance  traveled 
by  each  truck.  This  problem  is  the  well-known  “traveling-salesman  problem,”  and 
it  is  NP-complete.  It  has  no  known  efficient  algorithm.  Under  certain  assumptions, 
however,  we  know  of  efficient  algorithms  that  give  an  overall  distance  which  is 
not  too  far  above  the  smallest  possible.  Chapter  35  discusses  such  “approximation 
algorithms.” 

Parallelism 

For  many  years,  we  could  count  on  processor  clock  speeds  increasing  at  a  steady 
rate.  Physical  limitations  present  a  fundamental  roadblock  to  ever-increasing  clock 
speeds,  however:  because  power  density  increases  superlinearly  with  clock  speed, 
chips  run  the  risk  of  melting  once  their  clock  speeds  become  high  enough.  In  order 
to  perform  more  computations  per  second,  therefore,  chips  are  being  designed  to 
contain  not  just  one  but  several  processing  “cores.”  We  can  liken  these  multicore 
computers  to  several  sequential  computers  on  a  single  chip;  in  other  words,  they  are 
a  type  of  “parallel  computer.”  In  order  to  elicit  the  best  performance  from  multicore 
computers,  we  need  to  design  algorithms  with  parallelism  in  mind.  Chapter  27 
presents  a  model  for  “multithreaded”  algorithms,  which  take  advantage  of  multiple 
cores.  This  model  has  advantages  from  a  theoretical  standpoint,  and  it  forms  the 
basis  of  several  successful  computer  programs,  including  a  championship  chess 
program. 


1 .2  Algorithms  as  a  technology 


11 


Exercises 


1.1-1 

Give  a  real-world  example  that  requires  sorting  or  a  real-world  example  that  re¬ 
quires  computing  a  convex  hull. 


1.1-2 

Other  than  speed,  what  other  measures  of  efficiency  might  one  use  in  a  real-world 
setting? 


1.1- 3 

Select  a  data  structure  that  you  have  seen  previously,  and  discuss  its  strengths  and 
limitations. 

1.1- 4 

How  are  the  shortest-path  and  traveling-salesman  problems  given  above  similar? 
How  are  they  different? 


1.1-5 

Come  up  with  a  real-world  problem  in  which  only  the  best  solution  will  do.  Then 
come  up  with  one  in  which  a  solution  that  is  “approximately”  the  best  is  good 
enough. 


1.2  Algorithms  as  a  technology 

Suppose  computers  were  infinitely  fast  and  computer  memory  was  free.  Would 
you  have  any  reason  to  study  algorithms?  The  answer  is  yes,  if  for  no  other  reason 
than  that  you  would  still  like  to  demonstrate  that  your  solution  method  terminates 
and  does  so  with  the  correct  answer. 

If  computers  were  infinitely  fast,  any  correct  method  for  solving  a  problem 
would  do.  You  would  probably  want  your  implementation  to  be  within  the  bounds 
of  good  software  engineering  practice  (for  example,  your  implementation  should 
be  well  designed  and  documented),  but  you  would  most  often  use  whichever 
method  was  the  easiest  to  implement. 

Of  course,  computers  may  be  fast,  but  they  are  not  infinitely  fast.  And  memory 
may  be  inexpensive,  but  it  is  not  free.  Computing  time  is  therefore  a  bounded 
resource,  and  so  is  space  in  memory.  You  should  use  these  resources  wisely,  and 
algorithms  that  are  efficient  in  terms  of  time  or  space  will  help  you  do  so. 
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Efficiency 

Different  algorithms  devised  to  solve  the  same  problem  often  differ  dramatically  in 
their  efficiency.  These  differences  can  be  much  more  significant  than  differences 
due  to  hardware  and  software. 

As  an  example,  in  Chapter  2,  we  will  see  two  algorithms  for  sorting.  The  first, 
known  as  insertion  sort,  takes  time  roughly  equal  to  C\n2  to  sort  n  items,  where  C\ 
is  a  constant  that  does  not  depend  on  n.  That  is,  it  takes  time  roughly  proportional 
to  n2.  The  second,  merge  sort,  takes  time  roughly  equal  to  c2n  lg«,  where  lg  n 
stands  for  log2  n  and  c2  is  another  constant  that  also  does  not  depend  on  n .  Inser¬ 
tion  sort  typically  has  a  smaller  constant  factor  than  merge  sort,  so  that  c ,  <  c2. 
We  shall  see  that  the  constant  factors  can  have  far  less  of  an  impact  on  the  running 
time  than  the  dependence  on  the  input  size  n .  Let’s  write  insertion  sort’s  running 
time  as  C\ix  ■  n  and  merge  sort’s  running  time  as  c2n  ■  lg  n.  Then  we  see  that  where 
insertion  sort  has  a  factor  of  n  in  its  running  time,  merge  sort  has  a  factor  of  lg  n, 
which  is  much  smaller.  (For  example,  when  n  =  1000,  lg/i  is  approximately  10, 
and  when  n  equals  one  million,  lg  n  is  approximately  only  20.)  Although  insertion 
sort  usually  runs  faster  than  merge  sort  for  small  input  sizes,  once  the  input  size  n 
becomes  large  enough,  merge  sort’s  advantage  of  lg  n  vs.  n  will  more  than  com¬ 
pensate  for  the  difference  in  constant  factors.  No  matter  how  much  smaller  C\  is 
than  c2,  there  will  always  be  a  crossover  point  beyond  which  merge  sort  is  faster. 

For  a  concrete  example,  let  us  pit  a  faster  computer  (computer  A)  running  inser¬ 
tion  sort  against  a  slower  computer  (computer  B)  running  merge  sort.  They  each 
must  sort  an  array  of  10  million  numbers.  (Although  10  million  numbers  might 
seem  like  a  lot,  if  the  numbers  are  eight-byte  integers,  then  the  input  occupies 
about  80  megabytes,  which  fits  in  the  memory  of  even  an  inexpensive  laptop  com¬ 
puter  many  times  over.)  Suppose  that  computer  A  executes  10  billion  instructions 
per  second  (faster  than  any  single  sequential  computer  at  the  time  of  this  writing) 
and  computer  B  executes  only  10  million  instructions  per  second,  so  that  com¬ 
puter  A  is  1000  times  faster  than  computer  B  in  raw  computing  power.  To  make 
the  difference  even  more  dramatic,  suppose  that  the  world’s  craftiest  programmer 
codes  insertion  sort  in  machine  language  for  computer  A,  and  the  resulting  code 
requires  2 n2  instructions  to  sort  n  numbers.  Suppose  further  that  just  an  average 
programmer  implements  merge  sort,  using  a  high-level  language  with  an  inefficient 
compiler,  with  the  resulting  code  taking  50/7  lg  n  instructions.  To  sort  10  million 
numbers,  computer  A  takes 


2  ■  (107)2  instructions 
1010  instructions/second 


20,000  seconds  (more  than  5.5  hours)  , 


while  computer  B  takes 
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50  •  107  lg  107  instructions 
107  instructions/second 


1163  seconds  (less  than  20  minutes)  . 


By  using  an  algorithm  whose  running  time  grows  more  slowly,  even  with  a  poor 
compiler,  computer  B  runs  more  than  17  times  faster  than  computer  A!  The  advan¬ 
tage  of  merge  sort  is  even  more  pronounced  when  we  sort  100  million  numbers: 
where  insertion  sort  takes  more  than  23  days,  merge  sort  takes  under  four  hours. 
In  general,  as  the  problem  size  increases,  so  does  the  relative  advantage  of  merge 
sort. 


Algorithms  and  other  technologies 

The  example  above  shows  that  we  should  consider  algorithms,  like  computer  hard¬ 
ware,  as  a  technology.  Total  system  performance  depends  on  choosing  efficient 
algorithms  as  much  as  on  choosing  fast  hardware.  Just  as  rapid  advances  are  being 
made  in  other  computer  technologies,  they  are  being  made  in  algorithms  as  well. 

You  might  wonder  whether  algorithms  are  truly  that  important  on  contemporary 
computers  in  light  of  other  advanced  technologies,  such  as 

•  advanced  computer  architectures  and  fabrication  technologies, 

•  easy-to-use,  intuitive,  graphical  user  interfaces  (GUIs), 

•  object-oriented  systems, 

•  integrated  Web  technologies,  and 

•  fast  networking,  both  wired  and  wireless. 

The  answer  is  yes.  Although  some  applications  do  not  explicitly  require  algorith¬ 
mic  content  at  the  application  level  (such  as  some  simple,  Web-based  applications), 
many  do.  For  example,  consider  a  Web-based  service  that  determines  how  to  travel 
from  one  location  to  another.  Its  implementation  would  rely  on  fast  hardware,  a 
graphical  user  interface,  wide-area  networking,  and  also  possibly  on  object  ori¬ 
entation.  However,  it  would  also  require  algorithms  for  certain  operations,  such 
as  finding  routes  (probably  using  a  shortest-path  algorithm),  rendering  maps,  and 
interpolating  addresses. 

Moreover,  even  an  application  that  does  not  require  algorithmic  content  at  the 
application  level  relies  heavily  upon  algorithms.  Does  the  application  rely  on  fast 
hardware?  The  hardware  design  used  algorithms.  Does  the  application  rely  on 
graphical  user  interfaces?  The  design  of  any  GUI  relies  on  algorithms.  Does  the 
application  rely  on  networking?  Routing  in  networks  relies  heavily  on  algorithms. 
Was  the  application  written  in  a  language  other  than  machine  code?  Then  it  was 
processed  by  a  compiler,  interpreter,  or  assembler,  all  of  which  make  extensive  use 
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of  algorithms.  Algorithms  are  at  the  core  of  most  technologies  used  in  contempo¬ 
rary  computers. 

Furthermore,  with  the  ever-increasing  capacities  of  computers,  we  use  them  to 
solve  larger  problems  than  ever  before.  As  we  saw  in  the  above  comparison  be¬ 
tween  insertion  sort  and  merge  sort,  it  is  at  larger  problem  sizes  that  the  differences 
in  efficiency  between  algorithms  become  particularly  prominent. 

Having  a  solid  base  of  algorithmic  knowledge  and  technique  is  one  characteristic 
that  separates  the  truly  skilled  programmers  from  the  novices.  With  modern  com¬ 
puting  technology,  you  can  accomplish  some  tasks  without  knowing  much  about 
algorithms,  but  with  a  good  background  in  algorithms,  you  can  do  much,  much 
more. 

Exercises 


1.2-1 

Give  an  example  of  an  application  that  requires  algorithmic  content  at  the  applica¬ 
tion  level,  and  discuss  the  function  of  the  algorithms  involved. 


1.2-2 

Suppose  we  are  comparing  implementations  of  insertion  sort  and  merge  sort  on  the 
same  machine.  For  inputs  of  size  n,  insertion  sort  runs  in  8 n2  steps,  while  merge 
sort  runs  in  64/7  lg  n  steps.  For  which  values  of  n  does  insertion  sort  beat  merge 
sort? 


1.2-3 

What  is  the  smallest  value  of  n  such  that  an  algorithm  whose  running  time  is  100/j2 
runs  faster  than  an  algorithm  whose  running  time  is  2"  on  the  same  machine? 


Problems 


1-1  Comparison  of  running  times 

For  each  function  f(n)  and  time  t  in  the  following  table,  determine  the  largest 
size  n  of  a  problem  that  can  be  solved  in  time  t,  assuming  that  the  algorithm  to 
solve  the  problem  takes  / ( n )  microseconds. 


Notes  for  Chapter  1 
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1 

second 

1 

minute 

1 

hour 

1 

day 
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month 
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year 

1 
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lg  n 
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n  lg  n 

n2 

n 3 

2" 
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Chapter  notes 

There  are  many  excellent  texts  on  the  general  topic  of  algorithms,  including  those 
by  Aho,  Hopcroft,  and  Ullman  [5,  6];  Baase  and  Van  Gelder  [28];  Brassard  and 
Bratley  [54];  Dasgupta,  Papadimitriou,  and  Vazirani  [82];  Goodrich  and  Tamassia 
[148];  Hofri  [175];  Horowitz,  Sahni,  and  Rajasekaran  [181];  Johnsonbaugh  and 
Schaefer  [193];  Kingston  [205];  Kleinberg  and  Tardos  [208];  Knuth  [209,  210, 
211];  Kozen  [220];  Levitin  [235];  Manber  [242];  Mehlhorn  [249,  250,  251];  Pur- 
dom  and  Brown  [287];  Reingold,  Nievergelt,  and  Deo  [293];  Sedgewick  [306]; 
Sedgewick  and  Flajolet  [307];  Skiena  [318];  and  Wilf  [356].  Some  of  the  more 
practical  aspects  of  algorithm  design  are  discussed  by  Bentley  [42,  43]  and  Gonnet 
[145].  Surveys  of  the  field  of  algorithms  can  also  be  found  in  the  Handbook  of  The¬ 
oretical  Computer  Science,  Volume  A  [342]  and  the  CRC  Algorithms  and  Theory  of 
Computation  Handbook  [25].  Overviews  of  the  algorithms  used  in  computational 
biology  can  be  found  in  textbooks  by  Gusfield  [156],  Pevzner  [275],  Setubal  and 
Meidanis  [310],  and  Waterman  [350]. 
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Getting  Started 


This  chapter  will  familiarize  you  with  the  framework  we  shall  use  throughout  the 
book  to  think  about  the  design  and  analysis  of  algorithms.  It  is  self-contained,  but 
it  does  include  several  references  to  material  that  we  introduce  in  Chapters  3  and  4. 
(It  also  contains  several  summations,  which  Appendix  A  shows  how  to  solve.) 

We  begin  by  examining  the  insertion  sort  algorithm  to  solve  the  sorting  problem 
introduced  in  Chapter  1.  We  define  a  “pseudocode”  that  should  be  familial-  to  you  if 
you  have  done  computer  programming,  and  we  use  it  to  show  how  we  shall  specify 
our  algorithms.  Having  specified  the  insertion  sort  algorithm,  we  then  argue  that  it 
correctly  sorts,  and  we  analyze  its  running  time.  The  analysis  introduces  a  notation 
that  focuses  on  how  that  time  increases  with  the  number  of  items  to  be  sorted. 
Following  our  discussion  of  insertion  sort,  we  introduce  the  divide-and-conquer 
approach  to  the  design  of  algorithms  and  use  it  to  develop  an  algorithm  called 
merge  sort.  We  end  with  an  analysis  of  merge  sort’s  running  time. 


2.1  Insertion  sort 

Our  first  algorithm,  insertion  sort,  solves  the  sorting  problem  introduced  in  Chap¬ 
ter  1: 

Input:  A  sequence  of  n  numbers  (aq,  a2, . . . ,  an). 

Output:  A  permutation  (reordering)  (a\,  a'2 . a'n)  of  the  input  sequence  such 

that  a\  <  a'2  <  ■  ■  •  <  a'n. 

The  numbers  that  we  wish  to  sort  are  also  known  as  the  keys.  Although  conceptu¬ 
ally  we  are  sorting  a  sequence,  the  input  comes  to  us  in  the  form  of  an  array  with  n 
elements. 

In  this  book,  we  shall  typically  describe  algorithms  as  programs  written  in  a 
pseudocode  that  is  similar  in  many  respects  to  C,  C++,  Java,  Python,  or  Pascal.  If 
you  have  been  introduced  to  any  of  these  languages,  you  should  have  little  trouble 
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Figure  2.1  Sorting  a  hand  of  cards  using  insertion  sort. 

reading  our  algorithms.  What  separates  pseudocode  from  “real”  code  is  that  in 
pseudocode,  we  employ  whatever  expressive  method  is  most  clear  and  concise  to 
specify  a  given  algorithm.  Sometimes,  the  clearest  method  is  English,  so  do  not 
be  surprised  if  you  come  across  an  English  phrase  or  sentence  embedded  within 
a  section  of  “real”  code.  Another  difference  between  pseudocode  and  real  code 
is  that  pseudocode  is  not  typically  concerned  with  issues  of  software  engineering. 
Issues  of  data  abstraction,  modularity,  and  error  handling  are  often  ignored  in  order 
to  convey  the  essence  of  the  algorithm  more  concisely. 

We  start  with  insertion  sort,  which  is  an  efficient  algorithm  for  sorting  a  small 
number  of  elements.  Insertion  sort  works  the  way  many  people  sort  a  hand  of 
playing  cards.  We  start  with  an  empty  left  hand  and  the  cards  face  down  on  the 
table.  We  then  remove  one  card  at  a  time  from  the  table  and  insert  it  into  the 
correct  position  in  the  left  hand.  To  find  the  correct  position  for  a  card,  we  compare 
it  with  each  of  the  cards  already  in  the  hand,  from  right  to  left,  as  illustrated  in 
Figure  2.1.  At  all  times,  the  cards  held  in  the  left  hand  are  sorted,  and  these  cards 
were  originally  the  top  cards  of  the  pile  on  the  table. 

We  present  our  pseudocode  for  insertion  sort  as  a  procedure  called  INSERTION- 
SORT,  which  takes  as  a  parameter  an  array  A[\..n\  containing  a  sequence  of 
length  n  that  is  to  be  sorted.  (In  the  code,  the  number  n  of  elements  in  A  is  denoted 
by  A. length.)  The  algorithm  sorts  the  input  numbers  in  place :  it  rearranges  the 
numbers  within  the  array  A,  with  at  most  a  constant  number  of  them  stored  outside 
the  array  at  any  time.  The  input  array  A  contains  the  sorted  output  sequence  when 
the  Insertion-Sort  procedure  is  finished. 
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Figure  2.2  The  operation  of  INSERTION  SORT  on  the  array  A  =  (5,  2,  4,  6,  1,  3).  Array  indices 
appear  above  the  rectangles,  and  values  stored  in  the  array  positions  appear  within  the  rectangles, 
(a)  (e)  The  iterations  of  the  for  loop  of  lines  1  8.  In  each  iteration,  the  black  rectangle  holds  the 
key  taken  from  A[j],  which  is  compared  with  the  values  in  shaded  rectangles  to  its  left  in  the  test  of 
line  5.  Shaded  arrows  show  array  values  moved  one  position  to  the  right  in  line  6,  and  black  arrows 
indicate  where  the  key  moves  to  in  line  8.  (f)  The  final  sorted  array. 


Insertion-Sort  (4) 

1  for  j  =  2  to  A. length 

2  key  =  A[j] 

3  //  Insert  A  [j  J  into  the  sorted  sequence  A[  1  . .  j  —  1], 

4  i  =  j  ~  1 

5  while  i  >  0  and  A[i\  >  key 

6  A[i  +  1]  =  A[i\ 

7  i  = i- 1 

8  A[i  +  1]  =  key 

Loop  invariants  and  the  correctness  of  insertion  sort 

Figure  2.2  shows  how  this  algorithm  works  for  A  =  (5,  2,  4,  6,  1,  3).  The  in¬ 
dex  j  indicates  the  “current  card”  being  inserted  into  the  hand.  At  the  beginning 
of  each  iteration  of  the  for  loop,  which  is  indexed  by  j ,  the  subarray  consisting 
of  elements  A[  1  . .  j  —  1]  constitutes  the  currently  sorted  hand,  and  the  remaining 
subarray  A  [  j  +  I  . .  n]  corresponds  to  the  pile  of  cards  still  on  the  table.  In  fact, 
elements  A[1 . .  j  —  1]  are  the  elements  originally  in  positions  1  through  j  —  1,  but 
now  in  sorted  order.  We  state  these  properties  of  A[l . .  j  «  1]  formally  as  a  loop 
invariant : 

At  the  start  of  each  iteration  of  the  for  loop  of  lines  1-8,  the  subarray 
A[l . .  j  —  1]  consists  of  the  elements  originally  in  A[l . .  j  —  1],  but  in  sorted 
order. 

We  use  loop  invariants  to  help  us  understand  why  an  algorithm  is  correct.  We 
must  show  three  things  about  a  loop  invariant: 
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Initialization:  It  is  true  prior  to  the  first  iteration  of  the  loop. 

Maintenance:  If  it  is  true  before  an  iteration  of  the  loop,  it  remains  true  before  the 
next  iteration. 

Termination:  When  the  loop  terminates,  the  invariant  gives  us  a  useful  property 
that  helps  show  that  the  algorithm  is  correct. 

When  the  first  two  properties  hold,  the  loop  invariant  is  true  prior  to  every  iteration 
of  the  loop.  (Of  course,  we  are  free  to  use  established  facts  other  than  the  loop 
invariant  itself  to  prove  that  the  loop  invariant  remains  true  before  each  iteration.) 
Note  the  similarity  to  mathematical  induction,  where  to  prove  that  a  property  holds, 
you  prove  a  base  case  and  an  inductive  step.  Here,  showing  that  the  invariant  holds 
before  the  first  iteration  corresponds  to  the  base  case,  and  showing  that  the  invariant 
holds  from  iteration  to  iteration  corresponds  to  the  inductive  step. 

The  third  property  is  perhaps  the  most  important  one,  since  we  are  using  the  loop 
invariant  to  show  correctness.  Typically,  we  use  the  loop  invariant  along  with  the 
condition  that  caused  the  loop  to  terminate.  The  termination  property  differs  from 
how  we  usually  use  mathematical  induction,  in  which  we  apply  the  inductive  step 
infinitely;  here,  we  stop  the  “induction”  when  the  loop  terminates. 

Let  us  see  how  these  properties  hold  for  insertion  sort. 

Initialization:  We  start  by  showing  that  the  loop  invariant  holds  before  the  first 
loop  iteration,  when  j  —  2.1  The  subarray  A[  1  . .  j  —  1],  therefore,  consists 
of  just  the  single  element  A[l],  which  is  in  fact  the  original  element  in  4  [  I  ] . 
Moreover,  this  subarray  is  sorted  (trivially,  of  course),  which  shows  that  the 
loop  invariant  holds  prior  to  the  first  iteration  of  the  loop. 

Maintenance:  Next,  we  tackle  the  second  property:  showing  that  each  iteration 
maintains  the  loop  invariant.  Informally,  the  body  of  the  for  loop  works  by 
moving  A[j  —  1],  A  [j  —  2],  A[j  —  3],  and  so  on  by  one  position  to  the  right 
until  it  finds  the  proper  position  for  A[j]  (lines  4-7),  at  which  point  it  inserts 
the  value  of  A[j ]  (line  8).  The  subarray  A[  1  . .  j]  then  consists  of  the  elements 
originally  in  A[l . .  j],  but  in  sorted  order.  Incrementing  j  for  the  next  iteration 
of  the  for  loop  then  preserves  the  loop  invariant. 

A  more  formal  treatment  of  the  second  property  would  require  us  to  state  and 
show  a  loop  invariant  for  the  while  loop  of  lines  5-7.  At  this  point,  however, 


1  When  the  loop  is  a  for  loop,  the  moment  at  which  we  check  the  loop  invariant  just  prior  to  the  first 
iteration  is  immediately  after  the  initial  assignment  to  the  loop  counter  variable  and  just  before  the 
first  test  in  the  loop  header.  In  the  case  of  INSERTION  SORT,  this  time  is  after  assigning  2  to  the 
variable  j  but  before  the  first  test  of  whether  j  <  A.  length. 
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we  prefer  not  to  get  bogged  down  in  such  formalism,  and  so  we  rely  on  our 
informal  analysis  to  show  that  the  second  property  holds  for  the  outer  loop. 

Termination:  Finally,  we  examine  what  happens  when  the  loop  terminates.  The 
condition  causing  the  for  loop  to  terminate  is  that  j  >  A. length  =  n.  Because 
each  loop  iteration  increases  j  by  1,  we  must  have  j  =  n  +  1  at  that  time. 
Substituting  n  +  1  for  j  in  the  wording  of  loop  invariant,  we  have  that  the 
subarray  A[\  . . »]  consists  of  the  elements  originally  m  A[\  .  ,n\,  but  in  sorted 
order.  Observing  that  the  subarray  A[l . .  n]  is  the  entire  array,  we  conclude  that 
the  entire  array  is  sorted.  Hence,  the  algorithm  is  correct. 

We  shall  use  this  method  of  loop  invariants  to  show  correctness  later  in  this 

chapter  and  in  other  chapters  as  well. 

Pseudocode  conventions 

We  use  the  following  conventions  in  our  pseudocode. 

•  Indentation  indicates  block  structure.  For  example,  the  body  of  the  for  loop  that 
begins  on  line  1  consists  of  lines  2-8,  and  the  body  of  the  while  loop  that  begins 
on  line  5  contains  lines  6-7  but  not  line  8.  Our  indentation  style  applies  to 
if-else  statements2  as  well.  Using  indentation  instead  of  conventional  indicators 
of  block  structure,  such  as  begin  and  end  statements,  greatly  reduces  clutter 
while  preserving,  or  even  enhancing,  clarity.3 

•  The  looping  constructs  while,  for,  and  repeat-until  and  the  if-else  conditional 
construct  have  interpretations  similar  to  those  in  C,  C++,  Java,  Python,  and 
Pascal.4  In  this  book,  the  loop  counter  retains  its  value  after  exiting  the  loop, 
unlike  some  situations  that  arise  in  C++,  Java,  and  Pascal.  Thus,  immediately 
after  a  for  loop,  the  loop  counter’s  value  is  the  value  that  first  exceeded  the  for 
loop  bound.  We  used  this  property  in  our  correctness  argument  for  insertion 
sort.  The  for  loop  header  in  line  1  is  for  j  =  2  to  A. length,  and  so  when 
this  loop  terminates,  j  =  A. length  +  1  (or,  equivalently,  j  =  n  +  1,  since 
n  =  A. length).  We  use  the  keyword  to  when  a  for  loop  increments  its  loop 


2  In  an  if  else  statement,  we  indent  else  at  the  same  level  as  its  matching  if.  Although  we  omit  the 
keyword  then,  we  occasionally  refer  to  the  portion  executed  when  the  test  following  if  is  true  as  a 
then  clause.  For  multiway  tests,  we  use  elseif  for  tests  after  the  first  one. 

3  Each  pseudocode  procedure  in  this  book  appears  on  one  page  so  that  you  will  not  have  to  discern 
levels  of  indentation  in  code  that  is  split  across  pages. 

4Most  block  structured  languages  have  equivalent  constructs,  though  the  exact  syntax  may  differ. 
Python  lacks  repeat  until  loops,  and  its  for  loops  operate  a  little  differently  from  the  for  loops  in 
this  book. 


2.1  Insertion  sort 


21 


counter  in  each  iteration,  and  we  use  the  keyword  downto  when  a  for  loop 
decrements  its  loop  counter.  When  the  loop  counter  changes  by  an  amount 
greater  than  1 ,  the  amount  of  change  follows  the  optional  keyword  by. 

•  The  symbol  “//”  indicates  that  the  remainder  of  the  line  is  a  comment. 

•  A  multiple  assignment  of  the  form  i  =  j  =  e  assigns  to  both  variables  i  and  j 
the  value  of  expression  e\  it  should  be  treated  as  equivalent  to  the  assignment 
j  =  e  followed  by  the  assignment  i  =  j . 

•  Variables  (such  as  i,  j ,  and  key)  are  local  to  the  given  procedure.  We  shall  not 
use  global  variables  without  explicit  indication. 

•  We  access  array  elements  by  specifying  the  array  name  followed  by  the  in¬ 
dex  in  square  brackets.  For  example,  A[i]  indicates  the  z'th  element  of  the 
array  A.  The  notation  “. .”  is  used  to  indicate  a  range  of  values  within  an  ar¬ 
ray.  Thus,  A[\  . .j]  indicates  the  subarray  of  A  consisting  of  the  j  elements 
A[l\,A[2],...,A[j]. 

•  We  typically  organize  compound  data  into  objects,  which  are  composed  of 
attributes.  We  access  a  particular  attribute  using  the  syntax  found  in  many 
object-oriented  programming  languages:  the  object  name,  followed  by  a  dot, 
followed  by  the  attribute  name.  For  example,  we  treat  an  array  as  an  object 
with  the  attribute  length  indicating  how  many  elements  it  contains.  To  specify 
the  number  of  elements  in  an  array  A,  we  write  A.  length. 

We  treat  a  variable  representing  an  array  or  object  as  a  pointer  to  the  data  rep¬ 
resenting  the  array  or  object.  For  all  attributes  /  of  an  object  x,  setting  y  =  x 
causes  y.f  to  equal  x.f.  Moreover,  if  we  now  set  x./  =  3,  then  afterward  not 
only  does  x.f  equal  3,  but  y.f  equals  3  as  well.  In  other  words,  x  and  y  point 
to  the  same  object  after  the  assignment  y  =  x. 

Our  attribute  notation  can  “cascade.”  For  example,  suppose  that  the  attribute  / 
is  itself  a  pointer  to  some  type  of  object  that  has  an  attribute  g.  Then  the  notation 
x.f.g  is  implicitly  parenthesized  as  (x.f).g.  In  other  words,  if  we  had  assigned 
y  =  x.f,  then  x.f.g  is  the  same  as  y.g. 

Sometimes,  a  pointer  will  refer  to  no  object  at  all.  In  this  case,  we  give  it  the 
special  value  NIL. 

•  We  pass  parameters  to  a  procedure  by  value :  the  called  procedure  receives  its 
own  copy  of  the  parameters,  and  if  it  assigns  a  value  to  a  parameter,  the  change 
is  not  seen  by  the  calling  procedure.  When  objects  are  passed,  the  pointer  to 
the  data  representing  the  object  is  copied,  but  the  object’s  attributes  are  not.  For 
example,  if  x  is  a  parameter  of  a  called  procedure,  the  assignment  x  —  y  within 
the  called  procedure  is  not  visible  to  the  calling  procedure.  The  assignment 
x.f  —  3,  however,  is  visible.  Similarly,  arrays  are  passed  by  pointer,  so  that 
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a  pointer  to  the  array  is  passed,  rather  than  the  entire  array,  and  changes  to 
individual  array  elements  are  visible  to  the  calling  procedure. 

•  A  return  statement  immediately  transfers  control  back  to  the  point  of  call  in 
the  calling  procedure.  Most  return  statements  also  take  a  value  to  pass  back  to 
the  caller.  Our  pseudocode  differs  from  many  programming  languages  in  that 
we  allow  multiple  values  to  be  returned  in  a  single  return  statement. 

•  The  boolean  operators  “and”  and  “or”  are  short  circuiting.  That  is,  when  we 
evaluate  the  expression  “x  and  y”  we  first  evaluate  x.  If  x  evaluates  to  FALSE, 
then  the  entire  expression  cannot  evaluate  to  TRUE,  and  so  we  do  not  evaluate  y. 
If,  on  the  other  hand,  x  evaluates  to  TRUE,  we  must  evaluate  y  to  determine  the 
value  of  the  entire  expression.  Similarly,  in  the  expression  “x  or  y”  we  eval¬ 
uate  the  expression  y  only  if  x  evaluates  to  FALSE.  Short-circuiting  operators 
allow  us  to  write  boolean  expressions  such  as  “x  7^  NIL  and  x.f  =  y”  without 
worrying  about  what  happens  when  we  try  to  evaluate  x.f  when  x  is  NIL. 

•  The  keyword  error  indicates  that  an  error  occurred  because  conditions  were 
wrong  for  the  procedure  to  have  been  called.  The  calling  procedure  is  respon¬ 
sible  for  handling  the  error,  and  so  we  do  not  specify  what  action  to  take. 

Exercises 


2.1-1 

Using  Figure  2.2  as  a  model,  illustrate  the  operation  of  Insertion-Sort  on  the 
array  A  =  (31, 41,  59,  26,  41,  58). 


2.1-2 

Rewrite  the  Insertion-Sort  procedure  to  sort  into  nonincreasing  instead  of  non¬ 
decreasing  order. 


2.1-3 

Consider  the  searching  problem-. 

Input:  A  sequence  of  n  numbers  A  =  (a1;  a2 . an)  and  a  value  v. 

Output:  An  index  i  such  that  v  =  A[i]  or  the  special  value  NIL  if  v  does  not 
appear  in  A. 

Write  pseudocode  for  linear  search ,  which  scans  through  the  sequence,  looking 
for  v.  Using  a  loop  invariant,  prove  that  your  algorithm  is  correct.  Make  sure  that 
your  loop  invariant  fulfills  the  three  necessary  properties. 


2.1-4 

Consider  the  problem  of  adding  two  //-bit  binary  integers,  stored  in  two  //-element 
arrays  A  and  B.  The  sum  of  the  two  integers  should  be  stored  in  binary  form  in 
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an  (n  +  l)-element  array  C.  State  the  problem  formally  and  write  pseudocode  for 
adding  the  two  integers. 


2.2  Analyzing  algorithms 

Analyzing  an  algorithm  has  come  to  mean  predicting  the  resources  that  the  algo¬ 
rithm  requires.  Occasionally,  resources  such  as  memory,  communication  band¬ 
width,  or  computer  hardware  are  of  primary  concern,  but  most  often  it  is  compu¬ 
tational  time  that  we  want  to  measure.  Generally,  by  analyzing  several  candidate 
algorithms  for  a  problem,  we  can  identify  a  most  efficient  one.  Such  analysis  may 
indicate  more  than  one  viable  candidate,  but  we  can  often  discard  several  inferior 
algorithms  in  the  process. 

Before  we  can  analyze  an  algorithm,  we  must  have  a  model  of  the  implemen¬ 
tation  technology  that  we  will  use,  including  a  model  for  the  resources  of  that 
technology  and  their  costs.  For  most  of  this  book,  we  shall  assume  a  generic  one- 
processor,  random-access  machine  (RAM)  model  of  computation  as  our  imple¬ 
mentation  technology  and  understand  that  our  algorithms  will  be  implemented  as 
computer  programs.  In  the  RAM  model,  instructions  are  executed  one  after  an¬ 
other,  with  no  concurrent  operations. 

Strictly  speaking,  we  should  precisely  define  the  instructions  of  the  RAM  model 
and  their  costs.  To  do  so,  however,  would  be  tedious  and  would  yield  little  insight 
into  algorithm  design  and  analysis.  Yet  we  must  be  careful  not  to  abuse  the  RAM 
model.  For  example,  what  if  a  RAM  had  an  instruction  that  sorts?  Then  we  could 
sort  in  just  one  instruction.  Such  a  RAM  would  be  unrealistic,  since  real  computers 
do  not  have  such  instructions.  Our  guide,  therefore,  is  how  real  computers  are  de¬ 
signed.  The  RAM  model  contains  instructions  commonly  found  in  real  computers: 
arithmetic  (such  as  add,  subtract,  multiply,  divide,  remainder,  floor,  ceiling),  data 
movement  (load,  store,  copy),  and  control  (conditional  and  unconditional  branch, 
subroutine  call  and  return).  Each  such  instruction  takes  a  constant  amount  of  time. 

The  data  types  in  the  RAM  model  are  integer  and  floating  point  (for  storing  real 
numbers).  Although  we  typically  do  not  concern  ourselves  with  precision  in  this 
book,  in  some  applications  precision  is  crucial.  We  also  assume  a  limit  on  the  size 
of  each  word  of  data.  For  example,  when  working  with  inputs  of  size  n,  we  typ¬ 
ically  assume  that  integers  are  represented  by  clgn  bits  for  some  constant  c  >  1. 
We  require  c  >  1  so  that  each  word  can  hold  the  value  of  n ,  enabling  us  to  index  the 
individual  input  elements,  and  we  restrict  c  to  be  a  constant  so  that  the  word  size 
does  not  grow  arbitrarily.  (If  the  word  size  could  grow  arbitrarily,  we  could  store 
huge  amounts  of  data  in  one  word  and  operate  on  it  all  in  constant  time— clearly 
an  unrealistic  scenario.) 
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Real  computers  contain  instructions  not  listed  above,  and  such  instructions  rep¬ 
resent  a  gray  area  in  the  RAM  model.  For  example,  is  exponentiation  a  constant¬ 
time  instruction?  In  the  general  case,  no;  it  takes  several  instructions  to  compute  xy 
when  x  and  y  are  real  numbers.  In  restricted  situations,  however,  exponentiation  is 
a  constant-time  operation.  Many  computers  have  a  “shift  left”  instruction,  which 
in  constant  time  shifts  the  bits  of  an  integer  by  k  positions  to  the  left.  In  most 
computers,  shifting  the  bits  of  an  integer  by  one  position  to  the  left  is  equivalent 
to  multiplication  by  2,  so  that  shifting  the  bits  by  k  positions  to  the  left  is  equiv¬ 
alent  to  multiplication  by  2k.  Therefore,  such  computers  can  compute  2k  in  one 
constant-time  instruction  by  shifting  the  integer  1  by  k  positions  to  the  left,  as  long 
as  k  is  no  more  than  the  number  of  bits  in  a  computer  word.  We  will  endeavor  to 
avoid  such  gray  areas  in  the  RAM  model,  but  we  will  treat  computation  of  2k  as  a 
constant-time  operation  when  k  is  a  small  enough  positive  integer. 

In  the  RAM  model,  we  do  not  attempt  to  model  the  memory  hierarchy  that  is 
common  in  contemporary  computers.  That  is,  we  do  not  model  caches  or  virtual 
memory.  Several  computational  models  attempt  to  account  for  memory-hierarchy 
effects,  which  are  sometimes  significant  in  real  programs  on  real  machines.  A 
handful  of  problems  in  this  book  examine  memory-hierarchy  effects,  but  for  the 
most  part,  the  analyses  in  this  book  will  not  consider  them.  Models  that  include 
the  memory  hierarchy  are  quite  a  bit  more  complex  than  the  RAM  model,  and  so 
they  can  be  difficult  to  work  with.  Moreover,  RAM-model  analyses  are  usually 
excellent  predictors  of  performance  on  actual  machines. 

Analyzing  even  a  simple  algorithm  in  the  RAM  model  can  be  a  challenge.  The 
mathematical  tools  required  may  include  combinatorics,  probability  theory,  alge¬ 
braic  dexterity,  and  the  ability  to  identify  the  most  significant  terms  in  a  formula. 
Because  the  behavior  of  an  algorithm  may  be  different  for  each  possible  input,  we 
need  a  means  for  summarizing  that  behavior  in  simple,  easily  understood  formulas. 

Even  though  we  typically  select  only  one  machine  model  to  analyze  a  given  al¬ 
gorithm,  we  still  face  many  choices  in  deciding  how  to  express  our  analysis.  We 
would  like  a  way  that  is  simple  to  write  and  manipulate,  shows  the  important  char¬ 
acteristics  of  an  algorithm’s  resource  requirements,  and  suppresses  tedious  details. 

Analysis  of  insertion  sort 

The  time  taken  by  the  INSERTION-SORT  procedure  depends  on  the  input:  sorting  a 
thousand  numbers  takes  longer  than  sorting  three  numbers.  Moreover,  INSERTION- 
SORT  can  take  different  amounts  of  time  to  sort  two  input  sequences  of  the  same 
size  depending  on  how  nearly  sorted  they  already  are.  In  general,  the  time  taken 
by  an  algorithm  grows  with  the  size  of  the  input,  so  it  is  traditional  to  describe  the 
running  time  of  a  program  as  a  function  of  the  size  of  its  input.  To  do  so,  we  need 
to  define  the  terms  “running  time”  and  “size  of  input”  more  carefully. 
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The  best  notion  for  input  size  depends  on  the  problem  being  studied.  For  many 
problems,  such  as  sorting  or  computing  discrete  Fourier  transforms,  the  most  nat¬ 
ural  measure  is  the  number  of  items  in  the  input— for  example,  the  array  size  n 
for  sorting.  For  many  other  problems,  such  as  multiplying  two  integers,  the  best 
measure  of  input  size  is  the  total  number  of  bits  needed  to  represent  the  input  in 
ordinary  binary  notation.  Sometimes,  it  is  more  appropriate  to  describe  the  size  of 
the  input  with  two  numbers  rather  than  one.  For  instance,  if  the  input  to  an  algo¬ 
rithm  is  a  graph,  the  input  size  can  be  described  by  the  numbers  of  vertices  and 
edges  in  the  graph.  We  shall  indicate  which  input  size  measure  is  being  used  with 
each  problem  we  study. 

The  running  time  of  an  algorithm  on  a  particular  input  is  the  number  of  primitive 
operations  or  “steps”  executed.  It  is  convenient  to  define  the  notion  of  step  so 
that  it  is  as  machine-independent  as  possible.  For  the  moment,  let  us  adopt  the 
following  view.  A  constant  amount  of  time  is  required  to  execute  each  line  of  our 
pseudocode.  One  line  may  take  a  different  amount  of  time  than  another  line,  but 
we  shall  assume  that  each  execution  of  the  z'th  line  takes  time  c,- ,  where  c,  is  a 
constant.  This  viewpoint  is  in  keeping  with  the  RAM  model,  and  it  also  reflects 
how  the  pseudocode  would  be  implemented  on  most  actual  computers.5 

In  the  following  discussion,  our  expression  for  the  running  time  of  INSERTION- 
SORT  will  evolve  from  a  messy  formula  that  uses  all  the  statement  costs  c;  to  a 
much  simpler  notation  that  is  more  concise  and  more  easily  manipulated.  This 
simpler  notation  will  also  make  it  easy  to  determine  whether  one  algorithm  is  more 
efficient  than  another. 

We  start  by  presenting  the  Insertion-Sort  procedure  with  the  time  “cost” 
of  each  statement  and  the  number  of  times  each  statement  is  executed.  For  each 
j  =2,3 where  n  =  A. length,  we  let  tj  denote  the  number  of  times  the 
while  loop  test  in  line  5  is  executed  for  that  value  of  j .  When  a  for  or  while  loop 
exits  in  the  usual  way  (i.e.,  due  to  the  test  in  the  loop  header),  the  test  is  executed 
one  time  more  than  the  loop  body.  We  assume  that  comments  are  not  executable 
statements,  and  so  they  take  no  time. 


5There  are  some  subtleties  here.  Computational  steps  that  we  specify  in  English  are  often  variants 
of  a  procedure  that  requires  more  than  just  a  constant  amount  of  time.  For  example,  later  in  this 
book  we  might  say  “sort  the  points  by  x  coordinate,”  which,  as  we  shall  see,  takes  more  than  a 
constant  amount  of  time.  Also,  note  that  a  statement  that  calls  a  subroutine  takes  constant  time, 
though  the  subroutine,  once  invoked,  may  take  more.  That  is,  we  separate  the  process  of  calling  the 
subroutine  passing  parameters  to  it,  etc.  from  the  process  of  executing  the  subroutine. 
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Insertion-Sort  ( A ) 

cost 

times 

1 

for  j  =  2  to  A. length 

Cl 

n 

2 

key  =  A[j] 

c2 

n  —  1 

3 

//  Insert  A  [J  ]  into  the  sorted 

sequence  A[  1 . .  j  —  1]. 

0 

n  —  1 

4 

i  =  j  ~  1 

c4 

n  —  1 

5 

while  i  >  0  and  A[i]  >  key 

Cs 

e;u  h 

6 

A\i  +  1]  =  A\i\ 

C6 

EU«j 

7 

i  =  i  —  1 

Cl 

8 

A[i  +  1]  =  key 

C% 

n  —  1 

The  running  time  of  the  algorithm  is  the  sum  of  running  times  for  each  state¬ 
ment  executed;  a  statement  that  takes  c,  steps  to  execute  and  executes  n  times  will 
contribute  c,n  to  the  total  running  time.6  To  compute  T(n),  the  running  time  of 
Insertion-Sort  on  an  input  of  n  values,  we  sum  the  products  of  the  cost  and 
times  columns,  obtaining 

n  n 

T(n )  =  cxn  +  c2{n  -  1)  +  c4(/z  -  1)  +  c5  ^  tj  +  c6  ^(Z/  -  1) 

j  = 2  j= 2 

n 

+  Ci  —  1)  +  c%(n  ~  1)  • 

j=2 

Even  for  inputs  of  a  given  size,  an  algorithm’s  running  time  may  depend  on 
which  input  of  that  size  is  given.  For  example,  in  INSERTION-SORT,  the  best 
case  occurs  if  the  array  is  already  sorted.  For  each  j  =  2,  3, we  then  find 
that  A[i]  <  key  in  line  5  when  i  has  its  initial  value  of  j  —  1.  Thus  tj  =  1  for 
j  =  2,3 ,...,«,  and  the  best-case  running  time  is 

T(n)  =  cxn  +  c2(n  —  1)  +  c4(»  —  1)  +  c5(n  —  1)  +  c8(/t  —  1) 

=  (Cl  +  c2  +  c4  +  C5  +  Cg)n  —  (c2  +  c4  +  C5  +  Cg)  . 

We  can  express  this  running  time  as  an  +  h  for  constants  a  and  h  that  depend  on 
the  statement  costs  c,-;  it  is  thus  a  linear  function  of  n. 

If  the  array  is  in  reverse  sorted  order— that  is,  in  decreasing  order— the  worst 
case  results.  We  must  compare  each  element  A  [j  ]  with  each  element  in  the  entire 
sorted  subarray  A[l  . .  j  —  1],  and  so  tj  =  j  lor  j  =2,3,...,  n.  Noting  that 


6This  characteristic  does  not  necessarily  hold  for  a  resource  such  as  memory.  A  statement  that 
references  in  words  of  memory  and  is  executed  n  times  does  not  necessarily  reference  mil  distinct 
words  of  memory. 
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n(n  +  1) 


and 

£u  - 1) = 

7=2 

(see  Appendix  A  for  a  review  of  how  to  solve  these  summations),  we  find  that  in 
the  worst  case,  the  running  time  of  Insertion-Sort  is 

T(n)  =  ciii  +  c2(n  -  1)  +  c4(»  -  1)  +  c5  +  —  -  1 

+C6(^)+f7(^)  +  c,(„_i) 

=  (y  +  y  +  y)n2  +  (c,+c2  +  c4  +  |-^-|  +c,)b 

—  (^2  +  Q  +  ^5  +  c8)  • 

We  can  express  this  worst-case  running  time  as  an2  +  bn  +  c  for  constants  a,  b, 
and  c  that  again  depend  on  the  statement  costs  c, ;  it  is  thus  a  quadratic  function 
of  77. 

Typically,  as  in  insertion  sort,  the  running  time  of  an  algorithm  is  fixed  for  a 
given  input,  although  in  later  chapters  we  shall  see  some  interesting  “randomized” 
algorithms  whose  behavior  can  vary  even  for  a  fixed  input. 

Worst-case  and  average-case  analysis 

In  our  analysis  of  insertion  sort,  we  looked  at  both  the  best  case,  in  which  the  input 
array  was  already  sorted,  and  the  worst  case,  in  which  the  input  array  was  reverse 
sorted.  For  the  remainder  of  this  book,  though,  we  shall  usually  concentrate  on 
finding  only  the  worst-case  running  time ,  that  is,  the  longest  running  time  for  any 
input  of  size  n.  We  give  three  reasons  for  this  orientation. 

•  The  worst-case  running  time  of  an  algorithm  gives  us  an  upper  bound  on  the 
running  time  for  any  input.  Knowing  it  provides  a  guarantee  that  the  algorithm 
will  never  take  any  longer.  We  need  not  make  some  educated  guess  about  the 
running  time  and  hope  that  it  never  gets  much  worse. 

•  For  some  algorithms,  the  worst  case  occurs  fairly  often.  For  example,  in  search¬ 
ing  a  database  for  a  particular  piece  of  information,  the  searching  algorithm’s 
worst  case  will  often  occur  when  the  information  is  not  present  in  the  database. 
In  some  applications,  searches  for  absent  information  may  be  frequent. 
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•  The  “average  case”  is  often  roughly  as  bad  as  the  worst  case.  Suppose  that  we 
randomly  choose  n  numbers  and  apply  insertion  sort.  How  long  does  it  take  to 
determine  where  in  subarray  A[  1  . .  j  —  1]  to  insert  element  A[j ]?  On  average, 
half  the  elements  in  A[\  . .  j  —  1]  are  less  than  A  [j } ,  and  half  the  elements  are 
greater.  On  average,  therefore,  we  check  half  of  the  subarray  A[1 . .  j  —  1],  and 
so  tj  is  about  j/2.  The  resulting  average-case  running  time  turns  out  to  be  a 
quadratic  function  of  the  input  size,  just  like  the  worst-case  running  time. 

In  some  particular  cases,  we  shall  be  interested  in  the  average-case  running  time 
of  an  algorithm;  we  shall  see  the  technique  of  probabilistic  analysis  applied  to 
various  algorithms  throughout  this  book.  The  scope  of  average-case  analysis  is 
limited,  because  it  may  not  be  apparent  what  constitutes  an  “average”  input  for 
a  particular  problem.  Often,  we  shall  assume  that  all  inputs  of  a  given  size  are 
equally  likely.  In  practice,  this  assumption  may  be  violated,  but  we  can  sometimes 
use  a  randomized  algorithm ,  which  makes  random  choices,  to  allow  a  probabilistic 
analysis  and  yield  an  expected  running  time.  We  explore  randomized  algorithms 
more  in  Chapter  5  and  in  several  other  subsequent  chapters. 

Order  of  growth 

We  used  some  simplifying  abstractions  to  ease  our  analysis  of  the  INSERTION- 
SORT  procedure.  First,  we  ignored  the  actual  cost  of  each  statement,  using  the 
constants  c,-  to  represent  these  costs.  Then,  we  observed  that  even  these  constants 
give  us  more  detail  than  we  really  need:  we  expressed  the  worst-case  running  time 
as  <3/; 2  +  bn  +  c  for  some  constants  a,  b,  and  c  that  depend  on  the  statement 
costs  c, .  We  thus  ignored  not  only  the  actual  statement  costs,  but  also  the  abstract 

COStS  Cj . 

We  shall  now  make  one  more  simplifying  abstraction:  it  is  the  rate  of  growth , 
or  order  of  growth,  of  the  running  time  that  really  interests  us.  We  therefore  con¬ 
sider  only  the  leading  term  of  a  formula  (e.g.,  an2),  since  the  lower-order  terms  are 
relatively  insignificant  for  large  values  of  n .  We  also  ignore  the  leading  term’s  con¬ 
stant  coefficient,  since  constant  factors  are  less  significant  than  the  rate  of  growth 
in  determining  computational  efficiency  for  large  inputs.  For  insertion  sort,  when 
we  ignore  the  lower-order  terms  and  the  leading  term’s  constant  coefficient,  we  are 
left  with  the  factor  of  n2  from  the  leading  term.  We  write  that  insertion  sort  has  a 
worst-case  running  time  of  &(n2)  (pronounced  “theta  of  n -squared”).  We  shall  use 
0-notation  informally  in  this  chapter,  and  we  will  define  it  precisely  in  Chapter  3. 

We  usually  consider  one  algorithm  to  be  more  efficient  than  another  if  its  worst- 
case  running  time  has  a  lower  order  of  growth.  Due  to  constant  factors  and  lower- 
order  terms,  an  algorithm  whose  running  time  has  a  higher  order  of  growth  might 
take  less  time  for  small  inputs  than  an  algorithm  whose  running  time  has  a  lower 
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order  of  growth.  But  for  large  enough  inputs,  a  0(«2)  algorithm,  for  example,  will 
run  more  quickly  in  the  worst  case  than  a  0(«3)  algorithm. 

Exercises 


2.2-1 

Express  the  function  »  3 / 1 000  —  1 00/7 2  —  1 00 n  +  3  in  terms  of  0-notation. 


2.2-2 

Consider  sorting  n  numbers  stored  in  array  A  by  first  finding  the  smallest  element 
of  A  and  exchanging  it  with  the  element  in  H[l],  Then  find  the  second  smallest 
element  of  A,  and  exchange  it  with  A  [2] .  Continue  in  this  manner  for  the  first  n  —  1 
elements  of  A.  Write  pseudocode  for  this  algorithm,  which  is  known  as  selection 
sort.  What  loop  invariant  does  this  algorithm  maintain?  Why  does  it  need  to  run 
for  only  the  first  n  —  1  elements,  rather  than  for  all  n  elements?  Give  the  best-case 
and  worst-case  running  times  of  selection  sort  in  0-notation. 


2.2- 3 

Consider  linear  search  again  (see  Exercise  2.1-3).  How  many  elements  of  the  in¬ 
put  sequence  need  to  be  checked  on  the  average,  assuming  that  the  element  being 
searched  for  is  equally  likely  to  be  any  element  in  the  array?  How  about  in  the 
worst  case?  What  are  the  average-case  and  worst-case  running  times  of  linear 
search  in  0-notation?  Justify  your  answers. 

2.2- 4 

How  can  we  modify  almost  any  algorithm  to  have  a  good  best-case  running  time? 


2.3  Designing  algorithms 

We  can  choose  from  a  wide  range  of  algorithm  design  techniques.  For  insertion 
sort,  we  used  an  incremental  approach:  having  sorted  the  subarray  A[l . .  j  —  1], 
we  inserted  the  single  element  A  [j]  into  its  proper  place,  yielding  the  sorted 
subarray  A[  1  . .  j]. 

In  this  section,  we  examine  an  alternative  design  approach,  known  as  “divide  - 
and-conquer,”  which  we  shall  explore  in  more  detail  in  Chapter  4.  We’ll  use  divide- 
and-conquer  to  design  a  sorting  algorithm  whose  worst-case  running  time  is  much 
less  than  that  of  insertion  sort.  One  advantage  of  divide-and-conquer  algorithms  is 
that  their  running  times  are  often  easily  determined  using  techniques  that  we  will 
see  in  Chapter  4. 
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2.3.1  The  divide-and-conquer  approach 

Many  useful  algorithms  are  recursive  in  structure:  to  solve  a  given  problem,  they 
call  themselves  recursively  one  or  more  times  to  deal  with  closely  related  sub¬ 
problems.  These  algorithms  typically  follow  a  divide-and-conquer  approach:  they 
break  the  problem  into  several  subproblems  that  are  similar  to  the  original  prob¬ 
lem  but  smaller  in  size,  solve  the  subproblems  recursively,  and  then  combine  these 
solutions  to  create  a  solution  to  the  original  problem. 

The  divide-and-conquer  paradigm  involves  three  steps  at  each  level  of  the  recur¬ 
sion: 

Divide  the  problem  into  a  number  of  subproblems  that  are  smaller  instances  of  the 
same  problem. 

Conquer  the  subproblems  by  solving  them  recursively.  If  the  subproblem  sizes  are 
small  enough,  however,  just  solve  the  subproblems  in  a  straightforward  manner. 

Combine  the  solutions  to  the  subproblems  into  the  solution  for  the  original  prob¬ 
lem. 

The  merge  sort  algorithm  closely  follows  the  divide-and-conquer  paradigm.  In¬ 
tuitively,  it  operates  as  follows. 

Divide:  Divide  the  n -element  sequence  to  be  sorted  into  two  subsequences  of  n/2 
elements  each. 

Conquer:  Sort  the  two  subsequences  recursively  using  merge  sort. 

Combine:  Merge  the  two  sorted  subsequences  to  produce  the  sorted  answer. 

The  recursion  “bottoms  out”  when  the  sequence  to  be  sorted  has  length  1 ,  in  which 
case  there  is  no  work  to  be  done,  since  every  sequence  of  length  1  is  already  in 
sorted  order. 

The  key  operation  of  the  merge  sort  algorithm  is  the  merging  of  two  sorted 
sequences  in  the  “combine”  step.  We  merge  by  calling  an  auxiliary  procedure 
Merge  (A,  p,  q.  r),  where  A  is  an  array  and  p,  q,  and  r  are  indices  into  the  array 
such  that  p  <  q  <  r.  The  procedure  assumes  that  the  subarrays  A[p .  ,q\  and 
A[q  +  1  . .  r]  are  in  sorted  order.  It  merges  them  to  form  a  single  sorted  subarray 
that  replaces  the  current  subarray  A[p  . .  r]. 

Our  Merge  procedure  takes  time  ©(«),  where  n  =  r  —  p  +  1  is  the  total 
number  of  elements  being  merged,  and  it  works  as  follows.  Returning  to  our  card¬ 
playing  motif,  suppose  we  have  two  piles  of  cards  face  up  on  a  table.  Each  pile  is 
sorted,  with  the  smallest  cards  on  top.  We  wish  to  merge  the  two  piles  into  a  single 
sorted  output  pile,  which  is  to  be  face  down  on  the  table.  Our  basic  step  consists 
of  choosing  the  smaller  of  the  two  cards  on  top  of  the  face-up  piles,  removing  it 
from  its  pile  (which  exposes  a  new  top  card),  and  placing  this  card  face  down  onto 
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the  output  pile.  We  repeat  this  step  until  one  input  pile  is  empty,  at  which  time 
we  just  take  the  remaining  input  pile  and  place  it  face  down  onto  the  output  pile. 
Computationally,  each  basic  step  takes  constant  time,  since  we  are  comparing  just 
the  two  top  cards.  Since  we  perform  at  most  n  basic  steps,  merging  takes  0(n) 
time. 

The  following  pseudocode  implements  the  above  idea,  but  with  an  additional 
twist  that  avoids  having  to  check  whether  either  pile  is  empty  in  each  basic  step. 
We  place  on  the  bottom  of  each  pile  a  sentinel  card,  which  contains  a  special  value 
that  we  use  to  simplify  our  code.  Here,  we  use  oo  as  the  sentinel  value,  so  that 
whenever  a  card  with  oo  is  exposed,  it  cannot  be  the  smaller  card  unless  both  piles 
have  their  sentinel  cards  exposed.  But  once  that  happens,  all  the  nonsentinel  cards 
have  already  been  placed  onto  the  output  pile.  Since  we  know  in  advance  that 
exactly  r  —  p  +  1  cards  will  be  placed  onto  the  output  pile,  we  can  stop  once  we 
have  performed  that  many  basic  steps. 

MergeCH,  p.q ,  r) 

1  n  i  =  q  -  p  +  1 

2  n2  =  r  —  q 

3  let  L[1  . .«!  +  1]  and  R[l  .  ,n2  +  1]  be  new  arrays 

4  for  i  =  1  to  n  x 

5  L[i]  =  A[p  +  i  —  1] 

6  for  j  =  1  to  n2 

7  R[j]  =  A[q  +  j ] 

8  L[iii  +  1]  =  oo 

9  R[n2  +  1]  =  oo 
10  i  =  l 


7  =  1 

for  k  —  p  to  r 

if  L[i]  <  £[/'] 


11 

12 

13 

14 

15 

16 
17 


A[k]  —  L[i ] 
i  =  i  +  1 


else  A[k\  =  /?[/] 
7=7  +  1 


In  detail,  the  Merge  procedure  works  as  follows.  Line  1  computes  the  length  n  | 
of  the  subarray  A[p..q],  and  line  2  computes  the  length  n2  of  the  subarray 
A[q  +  1 . .  r].  We  create  arrays  L  and  R  (“left”  and  “right”),  of  lengths  n\  +  1 
and  «2  +  1,  respectively,  in  line  3;  the  extra  position  in  each  array  will  hold  the 
sentinel.  The  for  loop  of  lines  4-5  copies  the  subarray  A[p  . .  q]  into  L[1 .  .nj, 
and  the  for  loop  of  lines  6-7  copies  the  subarray  A[q  +  1  . .  r]  into  7?[1  . .  n2]. 
Lines  8-9  put  the  sentinels  at  the  ends  of  the  arrays  L  and  R.  Lines  10-17,  illus- 
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Figure  2.3  The  operation  of  lines  10  17  in  the  call  MERGEfA,  9,  12,  16),  when  the  subarray 
A[ 9. .  16]  contains  the  sequence  (2,  4,  5,  7.  1,  2,  3,  6).  After  copying  and  inserting  sentinels,  the 
array  L  contains  (2,  4,  5,  7,  oo),  and  the  array  R  contains  (1,  2,  3,  6,  oo).  Lightly  shaded  positions 
in  A  contain  their  final  values,  and  lightly  shaded  positions  in  L  and  R  contain  values  that  have  yet 
to  be  copied  back  into  A.  Taken  together,  the  lightly  shaded  positions  always  comprise  the  values 
originally  in  A[ 9 . .  16],  along  with  the  two  sentinels.  Heavily  shaded  positions  in  A  contain  values 
that  will  be  copied  over,  and  heavily  shaded  positions  in  L  and  R  contain  values  that  have  already 
been  copied  back  into  A.  (a)  (h)  The  arrays  A,  L,  and  R ,  and  their  respective  indices  k,  i,  and  j 
prior  to  each  iteration  of  the  loop  of  lines  12  17. 


trated  in  Figure  2.3,  perform  the  r  —  p  +  1  basic  steps  by  maintaining  the  following 
loop  invariant: 

At  the  start  of  each  iteration  of  the  for  loop  of  lines  12-17,  the  subarray 
A[p  .  .k  —  1]  contains  the  k  —  p  smallest  elements  of  L[\  .  ,nx  +  1]  and 
R[\.  .  n2  +  1],  in  sorted  order.  Moreover,  L[i\  and  R [j  J  are  the  smallest 
elements  of  their  arrays  that  have  not  been  copied  back  into  A. 

We  must  show  that  this  loop  invariant  holds  prior  to  the  first  iteration  of  the  for 
loop  of  lines  12-17,  that  each  iteration  of  the  loop  maintains  the  invariant,  and 
that  the  invariant  provides  a  useful  property  to  show  correctness  when  the  loop 
terminates. 

Initialization:  Prior  to  the  first  iteration  of  the  loop,  we  have  k  =  p,  so  that  the 
subarray  A[p  . .  k  —  1]  is  empty.  This  empty  subarray  contains  the  k  —  p  =  0 
smallest  elements  of  L  and  R,  and  since  i  =  7  =  1,  both  L[i]  and  R[j]  are  the 
smallest  elements  of  their  arrays  that  have  not  been  copied  back  into  A. 


2.3  Designing  algorithms 


33 


8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

A 

1 

2 

2 

3 

1 

2 

3 

6 

1 

2 

3 

4 

5 

k 

1 

2 

3 

4 

5 

2 

4 

5 

7 

OO 

R 

1 

2 

3 

6 

OO 

i 

(e) 

j 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

A 

1 

2 

2 

3 

4 

5 

3 

6 

1 

2 

3 

4 

5 

1 

k 

2 

3 

4 

5 

2 

4 

5 

7 

OO 

R 

1 

2 

3 

6 

OO 

j 

(g) 


8  9  10  11  12  13  14  15  16  17 

A  ... 

i 

2 

2 

3  4 

2 

3 

6 

1  2  3  4  5 

k 

1  2  3  4  5 

2  4 

5 

7 

OO 

R 

1 

2 

3 

6  oo 

i 

(f) 

j 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

A 

1 

2 

2 

3 

4 

5 

6 

6 

1 

2 

3 

4 

5 

1 

2 

k 

3 

4 

5 

2 

4 

5 

7 

OO 

R 

1 

2 

3 

6 

OO 

j 

(h) 


8 

9 

10 

11 

12 

1.3 

14 

15 

16 

17 

A 

1 

2 

2 

3 

4 

5 

6 

7 

1 

2 

3 

4 

5 

1 

2 

3 

k 

4 

5 

2 

4 

5 

7 

OO 

R 

1 

2 

3 

6 

OO 

i 

(i) 


Figure  2.3,  continued  (i)  The  arrays  and  indices  at  termination.  At  this  point,  the  subarray  in 
A  [9 . .  16]  is  sorted,  and  the  two  sentinels  in  L  and  R  are  the  only  two  elements  in  these  arrays  that 
have  not  been  copied  into  A. 

Maintenance:  To  see  that  each  iteration  maintains  the  loop  invariant,  let  us  first 
suppose  that  L[i]  <  R [j ] .  Then  L[i]  is  the  smallest  element  not  yet  copied 
back  into  A.  Because  A[p  . .  k  —  1]  contains  the  k  —  p  smallest  elements,  after 
line  14  copies  L[i]  into  A  [k J ,  the  subarray  A[p  . .  k]  will  contain  the  k  —  p  +  1 
smallest  elements.  Incrementing  k  (in  the  for  loop  update)  and  i  (in  line  15) 
reestablishes  the  loop  invariant  for  the  next  iteration.  If  instead  L  [i  ]  >  RU], 
then  lines  16-17  perform  the  appropriate  action  to  maintain  the  loop  invariant. 

Termination:  At  termination,  k  =  r  +  1.  By  the  loop  invariant,  the  subarray 
A[p  . .  k  —  1],  which  is  A[p . .  r],  contains  the  k  —  p  =  r  —  p  +  1  smallest 
elements  of  L[1  .  ,nx  +  1]  and  R[\  ,.n2  +  1],  in  sorted  order.  The  arrays  L 
and  R  together  contain  rt  \  +  n2  +  2  =  r  —  p  +  3  elements.  All  but  the  two 
largest  have  been  copied  back  into  A,  and  these  two  largest  elements  are  the 
sentinels. 
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To  see  that  the  Merge  procedure  runs  in  0(«)  time,  where  n  =  r  —  p  +  1, 
observe  that  each  of  lines  1-3  and  8-11  takes  constant  time,  the  for  loops  of 
lines  4-7  take  &(n  i  +  n2)  =  0(n)  time,7  and  there  are  n  iterations  of  the  for 
loop  of  lines  12-17,  each  of  which  takes  constant  time. 

We  can  now  use  the  Merge  procedure  as  a  subroutine  in  the  merge  sort  al¬ 
gorithm.  The  procedure  Merge-Sort (A,  p,  r)  sorts  the  elements  in  the  subar¬ 
ray  A[p . .  r].  If  p  >  r,  the  subarray  has  at  most  one  element  and  is  therefore 
already  sorted.  Otherwise,  the  divide  step  simply  computes  an  index  q  that  par¬ 
titions  A[p..r]  into  two  subarrays:  A[p..q\,  containing  [n/2]  elements,  and 
A[q  +  1  . .  r],  containing  [n/2\  elements.8 

Merge-Sort  (ri,  p,  r) 

1  if  p  <  r 

2  q  =  l(p  +  r  )/2J 

3  Merge- Sort(,4,  p,q) 

4  Merge-Sort  (A,q  +  l,r) 

5  MERGE(A,  p,  q,  r) 

To  sort  the  entire  sequence  A  =  { A  [  1  ] ,  A[ 2],  . . . ,  A [/?]),  we  make  the  initial  call 
Merge-Sort(4,  1,  A. length),  where  once  again  A. length  =  n.  Figure  2.4  il¬ 
lustrates  the  operation  of  the  procedure  bottom-up  when  n  is  a  power  of  2.  The 
algorithm  consists  of  merging  pairs  of  1-item  sequences  to  form  sorted  sequences 
of  length  2,  merging  pairs  of  sequences  of  length  2  to  form  sorted  sequences  of 
length  4,  and  so  on,  until  two  sequences  of  length  n  /2  are  merged  to  form  the  final 
sorted  sequence  of  length  n . 

2.3.2  Analyzing  divide-and-conquer  algorithms 

When  an  algorithm  contains  a  recursive  call  to  itself,  we  can  often  describe  its 
running  time  by  a  recurrence  equation  or  recurrence,  which  describes  the  overall 
running  time  on  a  problem  of  size  n  in  terms  of  the  running  time  on  smaller  inputs. 
We  can  then  use  mathematical  tools  to  solve  the  recurrence  and  provide  bounds  on 
the  performance  of  the  algorithm. 


7  We  shall  see  in  Chapter  3  how  to  formally  interpret  equations  containing  0  notation. 

8The  expression  \x~\  denotes  the  least  integer  greater  than  or  equal  to  x,  and  \x J  denotes  the  greatest 
integer  less  than  or  equal  to  x.  These  notations  are  defined  in  Chapter  3.  The  easiest  way  to  verify 
that  setting  q  to  [(p  +  r  )/2J  yields  subarrays  A[p  . .  q]  and  A\q  +  1  . .  r]  of  sizes  [n/2]  and  \  n/ 2J, 
respectively,  is  to  examine  the  four  cases  that  arise  depending  on  whether  each  of  p  and  r  is  odd  or 


even. 
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0  0  0  0 


initial  sequence 


Figure  2.4  The  operation  of  merge  sort  on  the  array  A  =  (5,2,4. 7, 1,  3, 2, 6).  The  lengths  of  the 
sorted  sequences  being  merged  increase  as  the  algorithm  progresses  from  bottom  to  top. 

A  recurrence  for  the  running  time  of  a  divide-and-conquer  algorithm  falls  out 
from  the  three  steps  of  the  basic  paradigm.  As  before,  we  let  T (n)  be  the  running 
time  on  a  problem  of  size  n.  If  the  problem  size  is  small  enough,  say  n  <  c 
for  some  constant  c,  the  straightforward  solution  takes  constant  time,  which  we 
write  as  0(1).  Suppose  that  our  division  of  the  problem  yields  a  subproblems, 
each  of  which  is  1  /b  the  size  of  the  original.  (For  merge  sort,  both  a  and  b  are  2, 
but  we  shall  see  many  divide-and-conquer  algorithms  in  which  a  ^  b.)  It  takes 
time  T(n/b)  to  solve  one  subproblem  of  size  n/b,  and  so  it  takes  time  aT(n/b) 
to  solve  a  of  them.  If  we  take  D(n)  time  to  divide  the  problem  into  subproblems 
and  C  ( n )  time  to  combine  the  solutions  to  the  subproblems  into  the  solution  to  the 
original  problem,  we  get  the  recurrence 

r(„)=(0(1)  \in  <c  , 

[aT(rt/b)  +  D(n)  +  C(n)  otherwise. 

In  Chapter  4,  we  shall  see  how  to  solve  common  recurrences  of  this  form. 

Analysis  of  merge  sort 

Although  the  pseudocode  for  Merge-Sort  works  correctly  when  the  number  of 
elements  is  not  even,  our  recurrence-based  analysis  is  simplified  if  we  assume  that 
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the  original  problem  size  is  a  power  of  2.  Each  divide  step  then  yields  two  subse¬ 
quences  of  size  exactly  n/ 2.  In  Chapter  4,  we  shall  see  that  this  assumption  does 
not  affect  the  order  of  growth  of  the  solution  to  the  recurrence. 

We  reason  as  follows  to  set  up  the  recurrence  for  T(n),  the  worst-case  running 
time  of  merge  sort  on  n  numbers.  Merge  sort  on  just  one  element  takes  constant 
time.  When  we  have  n  >  1  elements,  we  break  down  the  running  time  as  follows. 


Divide:  The  divide  step  just  computes  the  middle  of  the  subarray,  which  takes 
constant  time.  Thus,  D(n)  =  0(1). 

Conquer:  We  recursively  solve  two  subproblems,  each  of  size  n/2,  which  con¬ 
tributes  2T(n/2)  to  the  running  time. 

Combine:  We  have  already  noted  that  the  Merge  procedure  on  an  n -element 
subarray  takes  time  ©(/;),  and  so  C{n)  =  0(/t). 


When  we  add  the  functions  D(n)  and  C(n)  for  the  merge  sort  analysis,  we  are 
adding  a  function  that  is  ©(/?)  and  a  function  that  is  0(1).  This  sum  is  a  linear 
function  of  n,  that  is,  0(«).  Adding  it  to  the  2T (n/2)  term  from  the  “conquer” 
step  gives  the  recurrence  for  the  worst-case  running  time  T(n)  of  merge  sort: 


Tin)  =  j  ®(1)  “"  =  *• 

(27>/2)  +  ©(n)  ifn  >1  . 


(2.1) 


In  Chapter  4,  we  shall  see  the  “master  theorem,”  which  we  can  use  to  show 
that  T(n )  is  0(«lg«),  where  Ig  n  stands  for  log2  n.  Because  the  logarithm  func¬ 
tion  grows  more  slowly  than  any  1  i near  function,  for  large  enough  inputs,  merge 
sort,  with  its  0(nlg«)  running  time,  outperforms  insertion  sort,  whose  running 
time  is  0(n2),  in  the  worst  case. 

We  do  not  need  the  master  theorem  to  intuitively  understand  why  the  solution  to 
the  recurrence  (2.1)  is  T(n)  =  ©(«  Ig  /?).  Let  us  rewrite  recurrence  (2.1)  as 


T{n) 


c  if  n  =  1  , 

2T(n/2)  +  cn  if  n  >  1  , 


(2.2) 


where  the  constant  c  represents  the  time  required  to  solve  problems  of  size  1  as 
well  as  the  time  per  array  element  of  the  divide  and  combine  steps.9 


9  It  is  unlikely  that  the  same  constant  exactly  represents  both  the  time  to  solve  problems  of  size  1 
and  the  time  per  array  element  of  the  divide  and  combine  steps.  We  can  get  around  this  problem  by 
letting  c  be  the  larger  of  these  times  and  understanding  that  our  recurrence  gives  an  upper  bound  on 
the  running  time,  or  by  letting  c  be  the  lesser  of  these  times  and  understanding  that  our  recurrence 
gives  a  lower  bound  on  the  running  time.  Both  bounds  are  on  the  order  of  n  lg  n  and,  taken  together, 
give  a  ®(n  lg n)  running  time. 
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Figure  2.5  shows  how  we  can  solve  recurrence  (2.2).  For  convenience,  we  as¬ 
sume  that  n  is  an  exact  power  of  2.  Part  (a)  of  the  figure  shows  T(n),  which  we 
expand  in  part  (b)  into  an  equivalent  tree  representing  the  recurrence.  The  cn  term 
is  the  root  (the  cost  incurred  at  the  top  level  of  recursion),  and  the  two  subtrees  of 
the  root  are  the  two  smaller  recurrences  T  in  / 2).  Part  (c)  shows  this  process  carried 
one  step  further  by  expanding  T(n  / 2).  The  cost  incurred  at  each  of  the  two  sub¬ 
nodes  at  the  second  level  of  recursion  is  cn/ 2.  We  continue  expanding  each  node 
in  the  tree  by  breaking  it  into  its  constituent  parts  as  determined  by  the  recurrence, 
until  the  problem  sizes  get  down  to  1,  each  with  a  cost  of  c.  Paid  (d)  shows  the 
resulting  recursion  tree. 

Next,  we  add  the  costs  across  each  level  of  the  tree.  The  top  level  has  total 
cost  cn ,  the  next  level  down  has  total  cost  c  (zz /2)  +  c  (n /2)  =  cn ,  the  level  after 
that  has  total  cost  c(n/4)  +  c(zz/4)  +  c(zz/4)  +  c(zz/4)  =  cn,  and  soon.  In  general, 
the  level  i  below  the  top  has  2'  nodes,  each  contributing  a  cost  of  c(n/2‘ ),  so  that 
the  z  th  level  below  the  top  has  total  cost  2'  c in / 2' )  =  cn.  The  bottom  level  has  n 
nodes,  each  contributing  a  cost  of  c,  for  a  total  cost  of  cn. 

The  total  number  of  levels  of  the  recursion  tree  in  Figure  2.5  is  lgzz  +  1,  where 
zz  is  the  number  of  leaves,  corresponding  to  the  input  size.  An  informal  inductive 
argument  justifies  this  claim.  The  base  case  occurs  when  n .  =  1 ,  in  which  case  the 
tree  has  only  one  level.  Since  lg  1  =  0,  we  have  that  lg  zz  +  1  gives  the  correct 
number  of  levels.  Now  assume  as  an  inductive  hypothesis  that  the  number  of  levels 
of  a  recursion  tree  with  2'  leaves  is  lg  2'  -F  1  =  i  +  1  (since  for  any  value  of  i, 
we  have  that  lg  2'  =  z).  Because  we  are  assuming  that  the  input  size  is  a  power 
of  2,  the  next  input  size  to  consider  is  2'+1.  A  tree  with  n  =  2,+1  leaves  has 
one  more  level  than  a  tree  with  2'  leaves,  and  so  the  total  number  of  levels  is 
(z  +  1)  +  1  =  lg  2'+1  +  1. 

To  compute  the  total  cost  represented  by  the  recurrence  (2.2),  we  simply  add  up 
the  costs  of  all  the  levels.  The  recursion  tree  has  lgzz  +  1  levels,  each  costing  cn, 
for  a  total  cost  of  czz(lgzz  +  1)  =  cn  lg  n  +  cn.  Ignoring  the  low-order  term  and 
the  constant  c  gives  the  desired  result  of  0(zz  lgzz). 

Exercises 


2.3-1 

Using  Figure  2.4  as  a  model,  illustrate  the  operation  of  merge  sort  on  the  array 
A  =  (3,41,52,26,38,57,9,49). 


2.3-2 

Rewrite  the  Merge  procedure  so  that  it  does  not  use  sentinels,  instead  stopping 
once  either  array  L  or  R  has  had  all  its  elements  copied  back  to  A  and  then  copying 
the  remainder  of  the  other  array  back  into  A. 
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Figure  2.5  How  to  construct  a  recursion  tree  for  the  recurrence  T(n)  =  2T(n/2)  +  cn. 
Part  (a)  shows  T(n),  which  progressively  expands  in  (b)  (d)  to  form  the  recursion  tree.  The  fully 
expanded  tree  in  part  (d)  has  lg  n  +  1  levels  (i.e.,  it  has  height  lg  n,  as  indicated),  and  each  level 
contributes  a  total  cost  of  cn.  The  total  cost,  therefore,  is  cn  lg  n  +  cn,  which  is  @(n  lg  «). 


Problems  for  Chapter  2 


39 


2.3- 3 

Use  mathematical  induction  to  show  that  when  n  is  an  exact  power  of  2,  the  solu¬ 
tion  of  the  recurrence 

2  if  n  =  2  , 

2 T (n/2)  +  n  if  n  =  2k,  for  k  >  1 

is  T(n)  =  n  lg  /?. 

2.3- 4 

We  can  express  insertion  sort  as  a  recursive  procedure  as  follows.  In  order  to  sort 
,4  [  I  . .  n\,  we  recursively  sort  A[l . .  n  —  1]  and  then  insert  A[n]  into  the  sorted  array 
A[\  .  .n  —  1].  Write  a  recurrence  for  the  running  time  of  this  recursive  version  of 
insertion  sort. 


2.3- 5 

Referring  back  to  the  searching  problem  (see  Exercise  2.1-3),  observe  that  if  the 
sequence  A  is  sorted,  we  can  check  the  midpoint  of  the  sequence  against  v  and 
eliminate  half  of  the  sequence  from  further  consideration.  The  binaty  search  al¬ 
gorithm  repeats  this  procedure,  halving  the  size  of  the  remaining  portion  of  the 
sequence  each  time.  Write  pseudocode,  either  iterative  or  recursive,  for  binary 
search.  Argue  that  the  worst-case  running  time  of  binary  search  is  0(lg  n). 

2.3- 6 

Observe  that  the  while  loop  of  lines  5-7  of  the  INSERTION-SORT  procedure  in 
Section  2.1  uses  a  linear  search  to  scan  (backward)  through  the  sorted  subarray 
A[\  . .  j  —  1].  Can  we  use  a  binary  search  (see  Exercise  2.3-5)  instead  to  improve 
the  overall  worst-case  running  time  of  insertion  sort  to  &(n  lg  «)? 

2.3- 7  * 

Describe  a  0 (n  lgn)-time  algorithm  that,  given  a  set  S  of  n  integers  and  another 
integer  x,  determines  whether  or  not  there  exist  two  elements  in  S  whose  sum  is 
exactly  x. 


Problems 


2-1  Insertion  sort  on  small  arrays  in  merge  sort 

Although  merge  sort  runs  in  <d(nlgn)  worst-case  time  and  insertion  sort  runs 
in  @(«2)  worst-case  time,  the  constant  factors  in  insertion  sort  can  make  it  faster 
in  practice  for  small  problem  sizes  on  many  machines.  Thus,  it  makes  sense  to 
coarsen  the  leaves  of  the  recursion  by  using  insertion  sort  within  merge  sort  when 
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subproblems  become  sufficiently  small.  Consider  a  modification  to  merge  sort  in 
which  n/k  sublists  of  length  k  are  sorted  using  insertion  sort  and  then  merged 
using  the  standard  merging  mechanism,  where  k  is  a  value  to  be  determined. 

a.  Show  that  insertion  sort  can  sort  the  n/k  sublists,  each  of  length  k,  in  Q(nk) 
worst-case  time. 

b.  Show  how  to  merge  the  sublists  in  0(/7  lgfn / k ))  worst-case  time. 

c.  Given  that  the  modified  algorithm  runs  in  @(nk  +  n  lg (n/ k)j  worst-case  time, 
what  is  the  largest  value  of  k  as  a  function  of  n  for  which  the  modified  algorithm 
has  the  same  running  time  as  standard  merge  sort,  in  terms  of  0-notation? 

d.  How  should  we  choose  k  in  practice? 

2-2  Correctness  of  bubblesort 

Bubblesort  is  a  popular,  but  inefficient,  sorting  algorithm.  It  works  by  repeatedly 
swapping  adjacent  elements  that  are  out  of  order. 

Bubblesort(H) 

1  for  i  =  1  to  A. length  —  1 

2  for  j  =  A .  length  down  to  i  +  1 

3  if  A[j]  <  A[j  —  1] 

4  exchange  A  [j  ]  with  A  [j  —  1] 

a.  Let  A '  denote  the  output  of  Bubblesort(H).  To  prove  that  Bubblesort  is 
correct,  we  need  to  prove  that  it  terminates  and  that 

A'[\}<  A'[2\<--- <  A'[n]  ,  (2.3) 

where  n  =  A. length.  In  order  to  show  that  Bubblesort  actually  sorts,  what 
else  do  we  need  to  prove? 

The  next  two  parts  will  prove  inequality  (2.3). 

b.  State  precisely  a  loop  invariant  for  the  for  loop  in  lines  2—4,  and  prove  that  this 
loop  invariant  holds.  Your  proof  should  use  the  structure  of  the  loop  invariant 
proof  presented  in  this  chapter. 

c.  Using  the  termination  condition  of  the  loop  invariant  proved  in  paid  (b),  state 
a  loop  invariant  for  the  for  loop  in  lines  1^4  that  will  allow  you  to  prove  in¬ 
equality  (2.3).  Your  proof  should  use  the  structure  of  the  loop  invariant  proof 
presented  in  this  chapter. 
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d.  What  is  the  worst-case  running  time  of  bubblesort?  How  does  it  compare  to  the 
running  time  of  insertion  sort? 

2-3  Correctness  of  Horner’s  rule 

The  following  code  fragment  implements  Horner’s  rule  for  evaluating  a  polynomial 

n 

P(x)  =  y^akxk 

k= 0 

=  do  +  x(cti  +  x(a2-\ - +  x(a„_i  +  xa„ )  ••■)), 

given  the  coefficients  . . . a„  and  a  value  for  x: 

1  y  =  0 

2  for  i  =  n  downto  0 

3  y  =  at  +  x  ■  y 

a.  In  terms  of  ©-notation,  what  is  the  running  time  of  this  code  fragment  for 
Homer’s  rule? 

b.  Write  pseudocode  to  implement  the  naive  polynomial-evaluation  algorithm  that 
computes  each  term  of  the  polynomial  from  scratch.  What  is  the  running  time 
of  this  algorithm?  How  does  it  compare  to  Horner’s  rule? 

c.  Consider  the  following  loop  invariant: 

At  the  start  of  each  iteration  of  the  for  loop  of  lines  2-3, 

n— (i  +  1) 

y  =  ^2  ak+i+1xk  . 

k= 0 

Interpret  a  summation  with  no  terms  as  equaling  0.  Following  the  structure  of 
the  loop  invariant  proof  presented  in  this  chapter,  use  this  loop  invariant  to  show 
that,  at  termination,  y  =  X^”-=o  akXk ■ 

d.  Conclude  by  arguing  that  the  given  code  fragment  correctly  evaluates  a  poly¬ 
nomial  characterized  by  the  coefficients  a0,  a.\, . . . ,  an. 

2-4  Inversions 

Let  A[l .  ,n\  be  an  array  of  n  distinct  numbers.  If  i  <  j  and  A[i]  >  A[j ],  then  the 

pair  (i,  j )  is  called  an  inversion  of  A. 

a.  List  the  live  inversions  of  the  array  (2,  3,  8,  6, 1). 
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b.  What  array  with  elements  from  the  set  {1,2 has  the  most  inversions? 
How  many  does  it  have? 

c.  What  is  the  relationship  between  the  running  time  of  insertion  sort  and  the 
number  of  inversions  in  the  input  array?  Justify  your  answer. 

d.  Give  an  algorithm  that  determines  the  number  of  inversions  in  any  permutation 
on  n  elements  in  &(n  lg  n)  worst-case  time.  (Hint:  Modify  merge  sort.) 


Chapter  notes 

In  1968,  Knuth  published  the  first  of  three  volumes  with  the  general  title  The  Art  of 
Computer  Programming  [209,  210,  211].  The  first  volume  ushered  in  the  modern 
study  of  computer  algorithms  with  a  focus  on  the  analysis  of  running  time,  and  the 
full  series  remains  an  engaging  and  worthwhile  reference  for  many  of  the  topics 
presented  here.  According  to  Knuth,  the  word  “algorithm”  is  derived  from  the 
name  “al-Khowarizmi,”  a  ninth-century  Persian  mathematician. 

Aho,  Hopcroft,  and  Ullman  [5]  advocated  the  asymptotic  analysis  of  algo¬ 
rithms— using  notations  that  Chapter  3  introduces,  including  0-notation— as  a 
means  of  comparing  relative  performance.  They  also  popularized  the  use  of  re¬ 
currence  relations  to  describe  the  running  times  of  recursive  algorithms. 

Knuth  [211]  provides  an  encyclopedic  treatment  of  many  sorting  algorithms.  His 
comparison  of  sorting  algorithms  (page  381)  includes  exact  step-counting  analyses, 
like  the  one  we  performed  here  for  insertion  sort.  Knuth’s  discussion  of  insertion 
sort  encompasses  several  variations  of  the  algorithm.  The  most  important  of  these 
is  Shell’s  sort,  introduced  by  D.  L.  Shell,  which  uses  insertion  sort  on  periodic 
subsequences  of  the  input  to  produce  a  faster  sorting  algorithm. 

Merge  sort  is  also  described  by  Knuth.  He  mentions  that  a  mechanical  colla¬ 
tor  capable  of  merging  two  decks  of  punched  cards  in  a  single  pass  was  invented 
in  1938.  J.  von  Neumann,  one  of  the  pioneers  of  computer  science,  apparently 
wrote  a  program  for  merge  sort  on  the  ED  VAC  computer  in  1945. 

The  early  history  of  proving  programs  correct  is  described  by  Gries  [153],  who 
credits  P.  Naur  with  the  first  article  in  this  field.  Gries  attributes  loop  invariants  to 
R.  W.  Floyd.  The  textbook  by  Mitchell  [256]  describes  more  recent  progress  in 
proving  programs  correct. 
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The  order  of  growth  of  the  running  time  of  an  algorithm,  defined  in  Chapter  2, 
gives  a  simple  characterization  of  the  algorithm’s  efficiency  and  also  allows  us  to 
compare  the  relative  performance  of  alternative  algorithms.  Once  the  input  size  n 
becomes  large  enough,  merge  sort,  with  its  ©(/?  Ig  n)  worst-case  running  time, 
beats  insertion  sort,  whose  worst-case  running  time  is  0(«2).  Although  we  can 
sometimes  determine  the  exact  running  time  of  an  algorithm,  as  we  did  for  insertion 
sort  in  Chapter  2,  the  extra  precision  is  not  usually  worth  the  effort  of  computing 
it.  For  large  enough  inputs,  the  multiplicative  constants  and  lower-order  terms  of 
an  exact  running  time  are  dominated  by  the  effects  of  the  input  size  itself. 

When  we  look  at  input  sizes  large  enough  to  make  only  the  order  of  growth  of 
the  running  time  relevant,  we  are  studying  the  asymptotic  efficiency  of  algorithms. 
That  is,  we  are  concerned  with  how  the  running  time  of  an  algorithm  increases  with 
the  size  of  the  input  in  the  limit,  as  the  size  of  the  input  increases  without  bound. 
Usually,  an  algorithm  that  is  asymptotically  more  efficient  will  be  the  best  choice 
for  all  but  very  small  inputs. 

This  chapter  gives  several  standard  methods  for  simplifying  the  asymptotic  anal¬ 
ysis  of  algorithms.  The  next  section  begins  by  defining  several  types  of  “asymp¬ 
totic  notation,”  of  which  we  have  already  seen  an  example  in  ©-notation.  We  then 
present  several  notational  conventions  used  throughout  this  book,  and  finally  we 
review  the  behavior  of  functions  that  commonly  arise  in  the  analysis  of  algorithms. 


3.1  Asymptotic  notation 

The  notations  we  use  to  describe  the  asymptotic  running  time  of  an  algorithm 
are  defined  in  terms  of  functions  whose  domains  are  the  set  of  natural  numbers 
N  =  {0,  1,2, . . .}.  Such  notations  are  convenient  for  describing  the  worst-case 
running-time  function  T(n),  which  usually  is  defined  only  on  integer  input  sizes. 
We  sometimes  find  it  convenient,  however,  to  abuse  asymptotic  notation  in  a  va- 
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riety  of  ways.  For  example,  we  might  extend  the  notation  to  the  domain  of  real 
numbers  or,  alternatively,  restrict  it  to  a  subset  of  the  natural  numbers.  We  should 
make  sure,  however,  to  understand  the  precise  meaning  of  the  notation  so  that  when 
we  abuse,  we  do  not  misuse  it.  This  section  defines  the  basic  asymptotic  notations 
and  also  introduces  some  common  abuses. 

Asymptotic  notation,  functions,  and  running  times 

We  will  use  asymptotic  notation  primarily  to  describe  the  running  times  of  algo¬ 
rithms,  as  when  we  wrote  that  insertion  sort’s  worst-case  running  time  is  0(n2). 
Asymptotic  notation  actually  applies  to  functions,  however.  Recall  that  we  charac¬ 
terized  insertion  sort’s  worst-case  running  time  as  an2  +  bn+c,  for  some  constants 
a ,  b,  and  c.  By  writing  that  insertion  sort’s  running  time  is  0(«2),  we  abstracted 
away  some  details  of  this  function.  Because  asymptotic  notation  applies  to  func¬ 
tions,  what  we  were  writing  as  0(/?2)  was  the  function  an2  +  bn  +  c,  which  in 
that  case  happened  to  characterize  the  worst-case  running  time  of  insertion  sort. 

In  this  book,  the  functions  to  which  we  apply  asymptotic  notation  will  usually 
characterize  the  running  times  of  algorithms.  But  asymptotic  notation  can  apply  to 
functions  that  characterize  some  other  aspect  of  algorithms  (the  amount  of  space 
they  use,  for  example),  or  even  to  functions  that  have  nothing  whatsoever  to  do 
with  algorithms. 

Even  when  we  use  asymptotic  notation  to  apply  to  the  running  time  of  an  al¬ 
gorithm,  we  need  to  understand  which  running  time  we  mean.  Sometimes  we  are 
interested  in  the  worst-case  running  time.  Often,  however,  we  wish  to  characterize 
the  running  time  no  matter  what  the  input.  In  other  words,  we  often  wish  to  make 
a  blanket  statement  that  covers  all  inputs,  not  just  the  worst  case.  We  shall  see 
asymptotic  notations  that  are  well  suited  to  characterizing  running  times  no  matter 
what  the  input. 

0 -notation 

In  Chapter  2,  we  found  that  the  worst-case  running  time  of  insertion  sort  is 
T(n)  =  © (/7 2 ) .  Let  us  define  what  this  notation  means.  For  a  given  function  g(n), 
we  denote  by  &(g(n))  the  set  of  functions 

0(g(/j))  =  {fin)  :  there  exist  positive  constants  C\,  c2,  and  n0  such  that 
0  <  C\g(n)  <  f{n)  <  c2g(n)  for  all  n  >  «o}  ■  ' 


1  Within  set  notation,  a  colon  means  “such  that.” 
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Figure  3.1  Graphic  examples  of  the  0,  0,  and  £2  notations.  In  each  part,  the  value  of  no  shown 
is  the  minimum  possible  value;  any  greater  value  would  also  work,  (a)  0  notation  bounds  a  func 
tion  to  within  constant  factors.  We  write  f(n)  =  &(g(n))  if  there  exist  positive  constants  n o,  c i, 
and  C2  such  that  at  and  to  the  right  of  no,  the  value  of  /(«)  always  lies  between  cig(n)  and  C2g(n) 
inclusive,  (b)  O  notation  gives  an  upper  bound  for  a  function  to  within  a  constant  factor.  We  write 
/  («)  =  0{g(nj)  if  there  are  positive  constants  «o  and  c  such  that  at  and  to  the  right  of  no,  the  value 
of  / (n)  always  lies  on  or  below  cg(n).  (c)  Q  notation  gives  a  lower  bound  for  a  function  to  within 
a  constant  factor.  We  write  f(n)  =  Q.(g(n))  if  there  are  positive  constants  no  and  c  such  that  at  and 
to  the  right  of  no,  the  value  of  / (n)  always  lies  on  or  above  cg(n). 

A  function  f(n )  belongs  to  the  set  0(g(n))  if  there  exist  positive  constants  C\ 
and  c2  such  that  it  can  be  “sandwiched”  between  C\g(n)  and  c2g{n),  for  suffi¬ 
ciently  large  n.  Because  0(g(n))  is  a  set,  we  could  write  “  f(n )  e  0(g(n))” 
to  indicate  that  f(n)  is  a  member  of  0(g(n)).  Instead,  we  will  usually  write 
“/(«)  =  0(g(«))”  to  express  the  same  notion.  You  might  be  confused  because 
we  abuse  equality  in  this  way,  but  we  shall  see  later  in  this  section  that  doing  so 
has  its  advantages. 

Figure  3.1(a)  gives  an  intuitive  picture  of  functions  f(n)  and  g(n),  where 
f(n)  —  ®(g(n)).  For  all  values  of  n  at  and  to  the  right  of  n0,  the  value  of  f(n ) 
lies  at  or  above  C\g(n)  and  at  or  below  c2g(n).  In  other  words,  for  all  n  >  n0,  the 
function  f(n )  is  equal  to  g(n)  to  within  a  constant  factor.  We  say  that  g(n)  is  an 
asymptotically  tight  bound  for  /(«). 

The  definition  of  0(g(«))  requires  that  every  member  fin)  e  &(g(n))  be 
asymptotically  nonnegative ,  that  is,  that  f(n)  be  nonnegative  whenever  n  is  suf¬ 
ficiently  large.  (An  asymptotically  positive  function  is  one  that  is  positive  for  all 
sufficiently  large  n.)  Consequently,  the  function  g(n)  itself  must  be  asymptotically 
nonnegative,  or  else  the  set  0(g(«))  is  empty.  We  shall  therefore  assume  that  every 
function  used  within  0-notation  is  asymptotically  nonnegative.  This  assumption 
holds  for  the  other  asymptotic  notations  defined  in  this  chapter  as  well. 
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In  Chapter  2,  we  introduced  an  informal  notion  of  0-notation  that  amounted 
to  throwing  away  lower-order  terms  and  ignoring  the  leading  coefficient  of  the 
highest-order  term.  Let  us  briefly  justify  this  intuition  by  using  the  formal  defi¬ 
nition  to  show  that  | n 2  —  3 n  =  0 (// 2 ) .  To  do  so,  we  must  determine  positive 
constants  Ci,  c2,  and  n0  such  that 

C\n2  <  -n2  —  3 n  <  c2n2 

for  all  n  >  n 0.  Dividing  by  n2  yields 

1  3 

ci  <  -  -  -  <c2  ■ 

2  n 

We  can  make  the  right-hand  inequality  hold  for  any  value  of  n  >  1  by  choosing  any 
constant  c2  >  1/2.  Likewise,  we  can  make  the  left-hand  inequality  hold  for  any 
value  of  7i  >  7  by  choosing  any  constant  C\  <  1/14.  Thus,  by  choosing  C\  =  1/14, 
c2  =  1/2,  and  /;0  =  7,  we  can  verify  that  jtt2  —  3 n  =  0 ( /? 2 ) .  Certainly,  other 
choices  for  the  constants  exist,  but  the  important  thing  is  that  some  choice  exists. 
Note  that  these  constants  depend  on  the  function  \n2  —  3«;  a  different  function 
belonging  to  0(/i2)  would  usually  require  different  constants. 

We  can  also  use  the  formal  definition  to  verify  that  6/? 3  ^  0(/;2).  Suppose 
for  the  purpose  of  contradiction  that  c2  and  n0  exist  such  that  6n 3  <  c2n2  for 
all  n  >  7j0.  But  then  dividing  by  n2  yields  it  <  c2/ 6,  which  cannot  possibly  hold 
for  arbitrarily  large  n ,  since  c2  is  constant. 

Intuitively,  the  lower-order  terms  of  an  asymptotically  positive  function  can  be 
ignored  in  determining  asymptotically  tight  bounds  because  they  are  insignificant 
for  large  n.  When  n  is  large,  even  a  tiny  fraction  of  the  highest-order  term  suf¬ 
fices  to  dominate  the  lower-order  terms.  Thus,  setting  C\  to  a  value  that  is  slightly 
smaller  than  the  coefficient  of  the  highest-order  term  and  setting  c2  to  a  value  that 
is  slightly  larger  permits  the  inequalities  in  the  definition  of  0-notation  to  be  sat¬ 
isfied.  The  coefficient  of  the  highest-order  term  can  likewise  be  ignored,  since  it 
only  changes  c ,  and  c2  by  a  constant  factor  equal  to  the  coefficient. 

As  an  example,  consider  any  quadratic  function  f(n)  =  an2  +  bn  +  c,  where 
a,  b,  and  c  are  constants  and  a  >  0.  Throwing  away  the  lower-order  terms  and 
ignoring  the  constant  yields  / (n)  =  &(n2).  Formally,  to  show  the  same  thing,  we 
take  the  constants  Ci  —  a/4,  c2  =  7a/4,  and  n0  —  2  ■  max(|£|  / a ,  yf\c\  /a).  You 
may  verify  that  0  <  cqu2  <  an 2  +  bn  +  c  <  c2n 2  for  all  n  >  n0.  In  general, 
for  any  polynomial  p(n)  =  Yl1=o  a‘n'  >  where  the  a,  are  constants  and  a(i  >  0,  we 
have  p(n)  =  Q(nd)  (see  Problem  3-1). 

Since  any  constant  is  a  degree-0  polynomial,  we  can  express  any  constant  func¬ 
tion  as  0(/r°),  or  0(1).  This  latter  notation  is  a  minor  abuse,  however,  because  the 


3.1  Asymptotic  notation 


47 


expression  does  not  indicate  what  variable  is  tending  to  infinity.2  We  shall  often 
use  the  notation  0(1)  to  mean  either  a  constant  or  a  constant  function  with  respect 
to  some  variable. 

O -notation 

The  ©-notation  asymptotically  bounds  a  function  from  above  and  below.  When 
we  have  only  an  asymptotic  upper  bound ,  we  use  O-notation.  For  a  given  func¬ 
tion  g(n),  we  denote  by  0(g(n))  (pronounced  “big-oh  of  g  of  «”  or  sometimes 
just  “oh  of  g  of  n”)  the  set  of  functions 

0(g(n))  =  {/(«)  :  there  exist  positive  constants  c  and  n0  such  that 
0  <  / (n)  <  cg(n)  for  all  n  >  n0}  . 

We  use  O -notation  to  give  an  upper  bound  on  a  function,  to  within  a  constant 
factor.  Figure  3.1(b)  shows  the  intuition  behind  O -notation.  For  all  values  n  at  and 
to  the  right  of  n0,  the  value  of  the  function  / (n)  is  on  or  below  cg(n). 

We  write  / (n)  =  0(g(n))  to  indicate  that  a  function  / (n)  is  a  member  of  the 
set  0(g(n)).  Note  that  f(n )  =  0(g(/i))  implies  f(n)  =  0(g(n)),  since  0- 
notation  is  a  stronger  notion  than  O -notation.  Written  set-theoretically,  we  have 
(~)(g{n))  C  0(g(n)).  Thus,  our  proof  that  any  quadratic  function  an 2  +  bn  +  c, 
where  a  >  0,  is  in  0(/j2)  also  shows  that  any  such  quadratic  function  is  in  0(n2). 
What  may  be  more  surprising  is  that  when  a  >  0,  any  linear  function  an  +  b  is 
in  0(n2),  which  is  easily  verified  by  taking  c  =  a  +  \b\  and  n0  =  max(l ,—b/a). 

If  you  have  seen  O -notation  before,  you  might  find  it  strange  that  we  should 
write,  for  example,  n  =  0(n2).  In  the  literature,  we  sometimes  find  O-notation 
informally  describing  asymptotically  tight  bounds,  that  is,  what  we  have  defined 
using  ©-notation.  In  this  book,  however,  when  we  write  f(n)  =  0(g(n)),  we 
are  merely  claiming  that  some  constant  multiple  of  g(n)  is  an  asymptotic  upper 
bound  on  /(«),  with  no  claim  about  how  tight  an  upper  bound  it  is.  Distinguish¬ 
ing  asymptotic  upper  bounds  from  asymptotically  tight  bounds  is  standard  in  the 
algorithms  literature. 

Using  (9 -no  tat  ion,  we  can  often  describe  the  running  time  of  an  algorithm 
merely  by  inspecting  the  algorithm’s  overall  structure.  For  example,  the  doubly 
nested  loop  structure  of  the  insertion  sort  algorithm  from  Chapter  2  immediately 
yields  an  0(n2)  upper  bound  on  the  worst-case  running  time:  the  cost  of  each  it¬ 
eration  of  the  inner  loop  is  bounded  from  above  by  0(1)  (constant),  the  indices  i 


2The  real  problem  is  that  our  ordinary  notation  for  functions  does  not  distinguish  functions  from 
values.  In  A  calculus,  the  parameters  to  a  function  are  clearly  specified:  the  function  n2  could  be 
written  as  A n.n2,  or  even  A r.r2.  Adopting  a  more  rigorous  notation,  however,  would  complicate 
algebraic  manipulations,  and  so  we  choose  to  tolerate  the  abuse. 
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and  j  are  both  at  most  n,  and  the  inner  loop  is  executed  at  most  once  for  each  of 
the  n2  pairs  of  values  for  i  and  j . 

Since  O  -notation  describes  an  upper  bound,  when  we  use  it  to  bound  the  worst- 
case  running  time  of  an  algorithm,  we  have  a  bound  on  the  running  time  of  the  algo¬ 
rithm  on  eve  17  input— the  blanket  statement  we  discussed  earlier.  Thus,  the  0(n2) 
bound  on  worst-case  running  time  of  insertion  sort  also  applies  to  its  running  time 
on  every  input.  The  0(«2)  bound  on  the  worst-case  running  time  of  insertion  sort, 
however,  does  not  imply  a  0(n2)  bound  on  the  running  time  of  insertion  sort  on 
every  input.  For  example,  we  saw  in  Chapter  2  that  when  the  input  is  already 
sorted,  insertion  sort  runs  in  ©(«)  time. 

Technically,  it  is  an  abuse  to  say  that  the  running  time  of  insertion  sort  is  0(n2), 
since  for  a  given  n,  the  actual  running  time  varies,  depending  on  the  particular 
input  of  size  n.  When  we  say  “the  running  time  is  0(n2),”  we  mean  that  there  is  a 
function  fin)  that  is  0(n2)  such  that  for  any  value  of  n,  no  matter  what  particular 
input  of  size  n  is  chosen,  the  running  time  on  that  input  is  bounded  from  above  by 
the  value  fin).  Equivalently,  we  mean  that  the  worst-case  running  time  is  0(n2). 

£2  -notation 

Just  as  O  -notation  provides  an  asymptotic  upper  bound  on  a  function,  £2  -notation 
provides  an  asymptotic  lower  bound.  For  a  given  function  gin),  we  denote 
by  Q(g(n))  (pronounced  “big-omega  of  g  of  n”  or  sometimes  just  “omega  of  g 
of  tij  the  set  of  functions 

£2(g(«))  =  {j  (it)  :  there  exist  positive  constants  c  and  n0  such  that 
0  <  eg  (n)  <  f  (n)  for  all  n  >  n 0}  . 

Figure  3.1(c)  shows  the  intuition  behind  £2 -notation.  For  all  values  n  at  or  to  the 
right  of  n0,  the  value  of  fin)  is  on  or  above  cg{n). 

From  the  definitions  of  the  asymptotic  notations  we  have  seen  thus  far,  it  is  easy 
to  prove  the  following  important  theorem  (see  Exercise  3.1-5). 

Theorem  3.1 

For  any  two  functions  fin)  and  gin),  we  have  fin )  =  0(e(«))  if  and  only  if 
fin)  =  Oiginj)  and  fin)  =  £2(g(n)).  ■ 

As  an  example  of  the  application  of  this  theorem,  our  proof  that  an2  +  bn  +  c  = 
0(/j2)  for  any  constants  a,  b,  and  c,  where  a  >  0,  immediately  implies  that 
an2  +  bn  +  c  =  £2(//2)  and  an2  +  bn  +  c  =  0(n2).  In  practice,  rather  than  using 
Theorem  3.1  to  obtain  asymptotic  upper  and  lower  bounds  from  asymptotically 
tight  bounds,  as  we  did  for  this  example,  we  usually  use  it  to  prove  asymptotically 
tight  bounds  from  asymptotic  upper  and  lower  bounds. 
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When  we  say  that  the  running  time  (no  modifier)  of  an  algorithm  is  £2(g(n)), 
we  mean  that  no  matter  what  particular  input  of  size  n  is  chosen  for  each  value 
of  n,  the  running  time  on  that  input  is  at  least  a  constant  times  gin),  for  sufficiently 
large  n .  Equivalently,  we  are  giving  a  lower  bound  on  the  best-case  running  time 
of  an  algorithm.  For  example,  the  best-case  running  time  of  insertion  sort  is  £2(n), 
which  implies  that  the  running  time  of  insertion  sort  is  £2  in). 

The  running  time  of  insertion  sort  therefore  belongs  to  both  Q.(n)  and  0{n2), 
since  it  falls  anywhere  between  a  1  inear  function  of  n  and  a  quadratic  function  of  n . 
Moreover,  these  bounds  are  asymptotically  as  tight  as  possible:  for  instance,  the 
running  time  of  insertion  sort  is  not  Q.{n2),  since  there  exists  an  input  for  which 
insertion  sort  runs  in  ©(«)  time  (e.g.,  when  the  input  is  already  sorted).  It  is  not 
contradictory,  however,  to  say  that  the  worst-case  running  time  of  insertion  sort 
is  Q(n2),  since  there  exists  an  input  that  causes  the  algorithm  to  take  Q.(n2)  time. 

Asymptotic  notation  in  equations  and  inequalities 

We  have  already  seen  how  asymptotic  notation  can  be  used  within  mathematical 
formulas.  For  example,  in  introducing  O-notation,  we  wrote  “n  =  0(n2).”  We 
might  also  write  2 n2  +  3/;  +  1  =  2 n2  +  ©(«).  How  do  we  interpret  such  formulas? 

When  the  asymptotic  notation  stands  alone  (that  is,  not  within  a  larger  formula) 
on  the  right-hand  side  of  an  equation  (or  inequality),  as  in  n  =  0(n2),  we  have 
already  defined  the  equal  sign  to  mean  set  membership:  n  €  0(n2).  In  general, 
however,  when  asymptotic  notation  appears  in  a  formula,  we  interpret  it  as  stand¬ 
ing  for  some  anonymous  function  that  we  do  not  care  to  name.  For  example,  the 
formula  2 n2  +  3n  +  1  =  2 n2  +  0(/z)  means  that  2 n2  +  3n  +  1  =  2 n2  +  fin), 
where  fin)  is  some  function  in  the  set  ©(«).  In  this  case,  we  let  f(n)  =  3 n  +  1, 
which  indeed  is  in  ©(«). 

Using  asymptotic  notation  in  this  manner  can  help  eliminate  inessential  detail 
and  clutter  in  an  equation.  For  example,  in  Chapter  2  we  expressed  the  worst-case 
running  time  of  merge  sort  as  the  recurrence 

T{n)  =  IT  inf  2)  +  &(n)  . 

If  we  are  interested  only  in  the  asymptotic  behavior  of  Tin),  there  is  no  point  in 
specifying  all  the  lower-order  terms  exactly;  they  are  all  understood  to  be  included 
in  the  anonymous  function  denoted  by  the  term  ©(//). 

The  number  of  anonymous  functions  in  an  expression  is  understood  to  be  equal 
to  the  number  of  times  the  asymptotic  notation  appears.  For  example,  in  the  ex¬ 
pression 

n 

E°(  o. 

i  =  i 
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there  is  only  a  single  anonymous  function  (a  function  of  /).  This  expression  is  thus 
not  the  same  as  0(1)  +  0(2)  +  •••  +  0(n),  which  doesn’t  really  have  a  clean 
interpretation. 

In  some  cases,  asymptotic  notation  appeal's  on  the  left-hand  side  of  an  equation, 
as  in 

2 n2  +  0(/i)  =  0(/i2)  . 

We  inteipret  such  equations  using  the  following  rule:  No  matter  how  the  anony¬ 
mous  functions  are  chosen  on  the  left  of  the  equal  sign,  there  is  a  way  to  choose 
the  anonymous  functions  on  the  right  of  the  equal  sign  to  make  the  equation  valid. 
Thus,  our  example  means  that  for  any  function  fin)  e  0(/i),  there  is  some  func¬ 
tion  g(n)  s  0 ( n 2 )  such  that  2 n2  +  f{n )  =  g(n)  for  all  n.  In  other  words,  the 
right-hand  side  of  an  equation  provides  a  coarser  level  of  detail  than  the  left-hand 
side. 

We  can  chain  together  a  number  of  such  relationships,  as  in 
2  n2  +  3n  +  l  =  2n2  +  0(/i) 

=  0(/i2). 

We  can  inteipret  each  equation  separately  by  the  rules  above.  The  first  equa¬ 
tion  says  that  there  is  some  function  f(n)  e  0(//)  such  that  2 n2  +  3«  +  1  = 
2 n2  +  f{n)  for  all  n.  The  second  equation  says  that  for  any  function  gin)  e  0(«) 
(such  as  the  fin)  just  mentioned),  there  is  some  function  //(/?)  e  0(/i2)  such 
that  2n2  +  gin)  =  /? (/?)  for  all  n.  Note  that  this  interpretation  implies  that 
2// 2  +  3/i  +  1  =  0(/i2),  which  is  what  the  chaining  of  equations  intuitively  gives 
us. 

o -notation 

The  asymptotic  upper  bound  provided  by  O  -notation  may  or  may  not  be  asymp¬ 
totically  tight.  The  bound  2 n2  =  0(n2)  is  asymptotically  tight,  but  the  bound 
2/i  =  0(/i2)  is  not.  We  use  o -notation  to  denote  an  upper  bound  that  is  not  asymp¬ 
totically  tight.  We  formally  define  o(g(/i))  (“little-oh  of  g  of  n”)  as  the  set 

o(g(n))  =  [fin)  :  for  any  positive  constant  c  >  0,  there  exists  a  constant 
n0  >  0  such  that  0  <  fin)  <  cg{n)  for  all  n  >  Ho}  • 

For  example,  2 n  =  o(/i2),  but  2 n2  f  oin2). 

The  definitions  of  O -notation  and  o-notation  are  similar.  The  main  difference 
is  that  in  fin)  =  0(g(n)),  the  bound  0  <  /(/?)  <  eg  in)  holds  for  some  con¬ 
stant  c  >  0,  but  in  fin)  =  o(g(/i)),  the  bound  0  <  fin)  <  eg  in)  holds  for  all 
constants  c  >  0.  Intuitively,  in  o-notation,  the  function  fin)  becomes  insignificant 
relative  to  gin)  as  n  approaches  infinity;  that  is, 
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lim 

n—>o o 


/(«) 

gin) 


=  0. 


(3.1) 


Some  authors  use  this  limit  as  a  definition  of  the  o-notation;  the  definition  in  this 
book  also  restricts  the  anonymous  functions  to  be  asymptotically  nonnegative. 


a> -notation 


By  analogy,  co-notation  is  to  Q-notation  as  o-notation  is  to  O-notation.  We  use 
co-notation  to  denote  a  lower  bound  that  is  not  asymptotically  tight.  One  way  to 
define  it  is  by 

fin)  6  co(g(n))  if  and  only  if  g(n)  €  o(f(n))  . 


Formally,  however,  we  define  co(g(n))  (“little-omega  of  g  of  n”)  as  the  set 

ojigin))  =  {/ (n)  :  for  any  positive  constant  c  >  0,  there  exists  a  constant 
n 0  >  0  such  that  0  <  cg(n)  <  f  in)  for  all  n  >  n0}  . 


For  example,  n2/ 2 
implies  that 


lim 

ft— >-oo 


/(") 

gin) 


-  oo  , 


co(n),  but  n2 / 2  ^  co(n2).  The  relation  f(n) 


(» (g  (n ) ) 


if  the  limit  exists.  That  is,  f(n)  becomes  arbitrarily  large  relative  to  g(n)  as  n 
approaches  infinity. 


Comparing  functions 


Many  of  the  relational  properties  of  real  numbers  apply  to  asymptotic  comparisons 
as  well.  For  the  following,  assume  that  f(n)  and  g(n)  are  asymptotically  positive. 

Transitivity: 


fin) 
fin ) 
fin) 
fin) 
fin ) 


0(g(«))  and  g{n) 
Oigin))  and  g(n) 
£lig in))  and  g(n) 
oigin))  and  g(n) 
0J(g(n))  and  g(n) 


=  0  (/?(«)) 

imply 

=  Oihin)) 

imply 

=  min)) 

imply 

=  oihin)) 

imply 

—  a>ihin)) 

imply 

fin )  =  ©(/?(«))  • 
fin)  =  0(h(n))  , 
fin)  =  n(h(n))  , 
fin)  =  o(h(n))  , 
fin)  =  (o(h(n))  . 


Reflexivity: 

fin)  =  ©(/(«)), 
fin)  =  Oifin)), 
fin)  =  nifin)). 
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Symmetry: 

fin)  =  0(g(n))  if  and  only  if  g(n)  =  ©(/(«))  . 

Transpose  symmetry: 

fin)  =  0{g{n))  if  and  only  if  g(n)  =  £2  (/(«)), 

/(«)  =  o(g(n))  if  and  only  if  g(n)  =  co(f(n))  . 

Because  these  properties  hold  for  asymptotic  notations,  we  can  draw  an  analogy 
between  the  asymptotic  comparison  of  two  functions  /  and  g  and  the  comparison 
of  two  real  numbers  a  and  b\ 


f(n)  - 

Oigin)) 

is  like 

VI 

^3 

fin)  - 

&igin)) 

is  like 

S3 

IV 

<5- 

fin)  = 

&igin)) 

is  like 

a  =  b 

fin)  = 

oigin)) 

is  like 

a  <  b 

fin)  = 

coigin)) 

is  like 

a  >  b 

We  say  that  / in)  is  asymptotically  smaller  than  gin)  if  / (n )  =  o(g(n)),  and  f(n) 
is  asymptotically  larger  than  g(n)  if  fin)  —  co{g{n)). 

One  property  of  real  numbers,  however,  does  not  carry  over  to  asymptotic  nota¬ 
tion: 

Trichotomy:  For  any  two  real  numbers  a  and  b,  exactly  one  of  the  following  must 
hold:  a  <  b,  a  =  b,  or  a  >  b. 

Although  any  two  real  numbers  can  be  compared,  not  all  functions  are  asymptot¬ 
ically  comparable.  That  is,  for  two  functions  f(n)  and  gin),  it  may  be  the  case 
that  neither  f(n)  =  0(g(n ))  nor  fin)  —  £2(g(n))  holds.  For  example,  we  cannot 
compare  the  functions  n  and  n 1  +Mn "  using  asymptotic  notation,  since  the  value  of 
the  exponent  in  n 1+smn  oscillates  between  0  and  2,  taking  on  all  values  in  between. 

Exercises 


3.1-1 

Let  fin)  and  g(n)  be  asymptotically  nonnegative  functions.  Using  the  basic  defi¬ 
nition  of  ©-notation,  prove  that  max(/(n),  g(n))  =  ©(/(/;)  +  gin)). 


3.1-2 

Show  that  for  any  real  constants  a  and  b,  where  b  >  0, 
(n  +  a)b  =  ®(nb)  . 


(3.2) 
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3.1- 3 

Explain  why  the  statement,  “The  running  time  of  algorithm  A  is  at  least  0(n2),”  is 
meaningless. 

3.1- 4 

Is  2"+1  =  0(2")?  Is  22n  =  0(2")? 

3.1- 5 

Prove  Theorem  3.1. 

3.1- 6 

Prove  that  the  running  time  of  an  algorithm  is  ®(g(n))  if  and  only  if  its  worst-case 
running  time  is  0(g(n))  and  its  best-case  running  time  is  Q(g(n)). 


3.1-7 

Prove  that  o(g(n))  n  co(g(n))  is  the  empty  set. 


3.1-8 

We  can  extend  our  notation  to  the  case  of  two  parameters  n  and  in  that  can  go  to 
infinity  independently  at  different  rates.  For  a  given  function  g(n,m),  we  denote 
by  0{g(n,mj)  the  set  of  functions 

0(g(n,m ))  =  { f(n,m )  :  there  exist  positive  constants  c,  n0,  and  m0 

such  that  0  <  f(n,m)  <  cg(n ,  m) 
for  all  n  >  n0  or  m  >  m0}  . 

Give  corresponding  definitions  for  Q(g(n.  in))  and  ®(g(n,m)). 


3.2  Standard  notations  and  common  functions 

This  section  reviews  some  standard  mathematical  functions  and  notations  and  ex¬ 
plores  the  relationships  among  them.  It  also  illustrates  the  use  of  the  asymptotic 
notations. 

Monotonicity 

A  function  f(n  )  is  monotonically  increasing  if  m  <  n  implies  / (m)  <  /(«). 
Similarly,  it  is  monotonically  decreasing  if  m  <  n  implies  f(m)  >  f  (n).  A 
function  /(/;)  is  strictly  increasing  if  m  <  n  implies  f(m)  <  f(n)  and  strictly 
decreasing  if  m  <  n  implies  / (m)  >  f  (n). 
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Floors  and  ceilings 


For  any  real  number  x,  we  denote  the  greatest  integer  less  than  or  equal  to  x  by  Lxj 
(read  “the  floor  of  x”)  and  the  least  integer  greater  than  or  equal  to  x  by  \x]  (read 
“the  ceiling  of  x ”).  For  all  real  x. 


x  —  1  <  Lxj  <  X  <  rxi  <  X  +  1  . 

For  any  integer  n, 

r«/2i  +  L»/2j  = n , 

and  for  any  real  number  x  >  0  and  integers  a,b  >  0, 


~  \x/a~\  ’ 

‘  X  " 

b 

ab 

\x/a\ 

X 

b 

-ab- 

a 

b 

a 

b- 


< 


> 


a  +  (b  -  1) 


b 

(b 


1) 


b 


(3-3) 


(3.4) 

(3-5) 

(3.6) 

(3.7) 


The  floor  function  / (x)  =  [xj  is  monotonically  increasing,  as  is  the  ceiling  func¬ 
tion  f(x)  =  [xj. 


Modular  arithmetic 

For  any  integer  a  and  any  positive  integer  n,  the  value  a  mod  n  is  the  remainder 
(or  residue )  of  the  quotient  a/n: 

a  mod  n  —  a  —  n  \a/n\  .  (3.8) 

It  follows  that 

0  <  a  mod  n  <  n  .  (3.9) 

Given  a  well-defined  notion  of  the  remainder  of  one  integer  when  divided  by  an¬ 
other,  it  is  convenient  to  provide  special  notation  to  indicate  equality  of  remainders. 
If  ( a  mod  n)  =  ( b  mod  //),  we  write  a  =  b  (mod  n)  and  say  that  a  is  equivalent 
to  b,  modulo  n.  In  other  words,  a  =  b  (mod  n  )  if  a  and  b  have  the  same  remain¬ 
der  when  divided  by  n.  Equivalently,  a  =  b  (mod  n)  if  and  only  if  n  is  a  divisor 
of  b  —  a.  We  write  a  f  h  (mod  n)  if  a  is  not  equivalent  to  b,  modulo  n. 


3.2  Standard  notations  and  common  functions 
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Polynomials 

Given  a  nonnegative  integer  d,  a  polynomial  in  n  of  degree  d  is  a  function  pin  ) 
of  the  form 
d 

p(n)  =  , 

(=  o 

where  the  constants  a0,  ax, . . . ,  ad  are  the  coefficients  of  the  polynomial  and 
ad  7^  0.  A  polynomial  is  asymptotically  positive  if  and  only  if  c/j  >  0.  For  an 
asymptotically  positive  polynomial  p{n)  of  degree  d,  we  have  pin)  =  @(nd).  For 
any  real  constant  a  >  0,  the  function  na  is  monotonically  increasing,  and  for  any 
real  constant  a  <  0,  the  function  na  is  monotonically  decreasing.  We  say  that  a 
function  f(n)  is  polynomially  bounded  if  f(n)  =  0{nk)  for  some  constant  k. 

Exponentials 

For  all  real  a  >  0,  m,  and  n,  we  have  the  following  identities: 


a0 

=  1  , 

a 1 

=  a  , 

a~l 

=  i/a  , 

( am)n 

=  amn  , 

(, am)n 

=  ( an)m 

aman 

=  am+n 

For  all  n  and  a  >  1,  the  function  an  is  monotonically  increasing  in  n.  When 
convenient,  we  shall  assume  0°  =  1. 

We  can  relate  the  rates  of  growth  of  polynomials  and  exponentials  by  the  fol¬ 
lowing  fact.  For  all  real  constants  a  and  b  such  that  a  >  1 , 

nb 

lim  —  =  0  ,  (3.10) 

n-*oo  an 

from  which  we  can  conclude  that 
nb  =  o{an)  . 

Thus,  any  exponential  function  with  a  base  strictly  greater  than  1  grows  faster  than 
any  polynomial  function. 

Using  e  to  denote  2.71828 . . .,  the  base  of  the  natural  logarithm  function,  we 
have  for  all  real  x, 
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where  “!”  denotes  the  factorial  function  defined  later  in  this  section.  For  all  real  x, 
we  have  the  inequality 

ex>l+x,  (3.12) 

where  equality  holds  only  when  x  =  0.  When  |x|  <  1,  we  have  the  approximation 
l  +  x<e*<l+x  +  x2.  (3.13) 

When  x  — >  0,  the  approximation  of  ex  by  1  +  x  is  quite  good: 
ex  =  1  +  x  +  @(x2)  . 

(In  this  equation,  the  asymptotic  notation  is  used  to  describe  the  limiting  behavior 
as  x  — »■  0  rather  than  as  x  — »•  oo.)  We  have  for  all  x, 

lim  (l  +  —  V  =  ex  .  (3.14) 

n-*o a  \  n  / 


Logarithms 

We  shall  use  the  following  notations: 


lg  n 

=  log2  n 

(binary  logarithm) 

In  n 

=  log,  n 

(natural  logarithm) 

lg*  n 

=  (lgn)* 

(exponentiation)  , 

lg  lg  n 

=  lg(lg«) 

(composition)  . 

An  important  notational  convention  we  shall  adopt  is  that  logarithm  functions  will 
apply  only  to  the  next  term  in  the  formula,  so  that  lg  n  +  k  will  mean  (lg  n )  +  k 
and  not  lg (n  +  k).  If  we  hold  h  >  1  constant,  then  for  n  >  0,  the  function  log,,  n 
is  strictly  increasing. 

For  all  real  a  >  0,  b  >  0,  c  >  0,  and  n , 

a  =  blogba  , 

log  c(ah)  =  logca  +  logch, 
log6  an  =  n  log,,  a  , 
logcfl 


log  ba 
logfcCl /a) 
log  ha 


logc  b  ' 

-  log,,  a  , 

1 


(3.15) 


,  logic  _ 


log  ab 

b a 


(3.16) 


where,  in  each  equation  above,  logarithm  bases  are  not  1. 


3.2  Standard  notations  and  common  functions 
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By  equation  (3.15),  changing  the  base  of  a  logarithm  from  one  constant  to  an¬ 
other  changes  the  value  of  the  logarithm  by  only  a  constant  factor,  and  so  we  shall 
often  use  the  notation  “lg  n”  when  we  don’t  care  about  constant  factors,  such  as  in 
O -notation.  Computer  scientists  find  2  to  be  the  most  natural  base  for  logarithms 
because  so  many  algorithms  and  data  structures  involve  splitting  a  problem  into 
two  parts. 

There  is  a  simple  series  expansion  for  ln(l  +  x)  when  |jc|  <  1 : 


2  3  4  5 

-  XZ  XJ  X4  x"5 

h,(l+*)=*-T+T-T+T 


We  also  have  the  following  inequalities  for  x  >  —  1 : 


— —  <  ln(l  +  x)  <  x  , 
l  +  x 


(3.17) 


where  equality  holds  only  for  x  =  0. 

We  say  that  a  function  f(n)  is  polylogarithmically  bounded  if  f(n)—  0{\gk  n) 
for  some  constant  k.  We  can  relate  the  growth  of  polynomials  and  polylogarithms 
by  substituting  lg  n  for  n  and  2“  for  a  in  equation  (3.10),  yielding 


lim 

n-+o o 


lg  bn 
(2°)lgn 


n— >oo  fia 


=  o . 


From  this  limit,  we  can  conclude  that 


lg*  n  =  o(na ) 


for  any  constant  a  >  0.  Thus,  any  positive  polynomial  function  grows  faster  than 
any  polylogarithmic  function. 


Factorials 

The  notation  n !  (read  “n  factorial”)  is  defined  for  integers  n  >  0  as 

1  if  n  =  0  , 

n  -  (n  —  1 ) !  if  n  >  0  . 

Thus,  n\  =  1-2-3 ■■■n. 

A  weak  upper  bound  on  the  factorial  function  is  «!  <  n",  since  each  of  the  n 
terms  in  the  factorial  product  is  at  most  n.  Stirling’s  approximation , 


»!  =  V5^0"(i +  ©(!))  , 


(3.18) 
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where  e  is  the  base  of  the  natural  logarithm,  gives  us  a  tighter  upper  bound,  and  a 
lower  bound  as  well.  As  Exercise  3.2-3  asks  you  to  prove, 

ft!  =  o(nn ) , 

ft!  =  co( 2")  , 

lg(«!)  =  ©(ftlgft),  (3.19) 


where  Stirling’s  approximation  is  helpful  in  proving  equation  (3.19).  The  following 
equation  also  holds  for  all  /;  >  1 : 

n\  =  (-)V"  (3.20) 

where 


1  1 

- <  an  < - . 

12n  +  1  12n 


(3.21) 


Functional  iteration 


We  use  the  notation  /(9(n)  to  denote  the  function  /(ft)  iteratively  applied  i  times 
to  an  initial  value  of  ft.  Formally,  let  /(ft)  be  a  function  over  the  reals.  For  non¬ 
negative  integers  i ,  we  recursively  define 


ft  if  i  =  0  , 

/(/(,'_1)(*))  if  /  >0. 


For  example,  if  /(ft)  =  2 ft,  then  f(,\n)  =  2‘n. 


The  iterated  logarithm  function 

We  use  the  notation  lg*  n  (read  “log  star  of  ft”)  to  denote  the  iterated  logarithm,  de¬ 
fined  as  follows.  Fet  lg(,)  n  be  as  defined  above,  with  /(ft)  =  lg  ft.  Because  the  log¬ 
arithm  of  a  nonpositive  number  is  undefined,  lg(,)  n  is  defined  only  if  lg(,_  1 }  n  >  0. 
Be  sure  to  distinguish  lg(,)  n  (the  logarithm  function  applied  /  times  in  succession, 
stalling  with  argument  ft)  from  lg'  n  (the  logarithm  of  n  raised  to  the  /th  power). 
Then  we  define  the  iterated  logarithm  function  as 

lg*  /?  =  min  {/  >  0  :  lg(,)  ft  <  1}  . 

The  iterated  logarithm  is  a  very  slowly  growing  function: 
lg*2  =  1, 

lg*  4  =  2, 
lg*  16  =  3, 
lg*  65536  =  4, 
lg*(265536)  =  5  _ 
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Since  the  number  of  atoms  in  the  observable  universe  is  estimated  to  be  about  1080, 
which  is  much  less  than  265536,  we  rarely  encounter  an  input  size  n  such  that 
lg*  n  >  5. 


Fibonacci  numbers 


We  define  the  Fibonacci  numbers  by  the  following  recurrence: 

F0  =  0, 

Fi  =  1  ,  (3.22) 

Ft  =  Ft- 1  +  Ft- 2  for  i  >2  . 


Thus,  each  Fibonacci  number  is  the  sum  of  the  two  previous  ones,  yielding  the 
sequence 

0,  1,  1,  2,  3,  5,  8,  13,  21,  34,  55,  ...  . 

Fibonacci  numbers  are  related  to  the  golden  ratio  cp  and  to  its  conjugate  (p,  which 
are  the  two  roots  of  the  equation 

x2  =  x+l  (3.23) 


and  are  given  by  the  following  formulas  (see  Exercise  3.2-6): 


<P 

4> 


1  +  Vs 

2 

1.61803...  , 

i  -  Vs 

2 

-.61803...  . 


(3.24) 


Specifically,  we  have 

,  _  V  -  & 

Vs 

which  we  can  prove  by  induction  (Exercise  3.2-7).  Since  \<p\  <  1,  we  have 

\V\  <  J_ 

Vs  Vs 

1 

<  - , 

2 

which  implies  that 
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Ft  = 


ch1  1 

71  +  2. 


(3.25) 


which  is  to  say  that  the  i  th  Fibonacci  number  F,  is  equal  to  ft  /  yfs  rounded  to  the 
nearest  integer.  Thus,  Fibonacci  numbers  grow  exponentially. 


Exercises 


3.2-1 

Show  that  if  /(«)  and  g(n)  are  monotonically  increasing  functions,  then  so  are 
the  functions  f(n)  +  g(n)  and  f(g(n)),  and  if  f(n)  and  g(n)  are  in  addition 
nonnegative,  then  f(n)  ■  g(n)  is  monotonically  increasing. 


3.2- 2 

Prove  equation  (3.16). 

3.2- 3 

Prove  equation  (3.19).  Also  prove  that  n\  =  a>(2")  and  n\  =  o(nn). 

3.2- 4  * 

Is  the  function  [lg  n] !  polynomially  bounded?  Is  the  function  [lg  lg  /?] !  polynomi¬ 
al^  bounded? 

3.2- 5  * 

Which  is  asymptotically  larger:  lg(lg*  n)  or  lg* (lg  77)? 


3.2-6 

Show  that  the  golden  ratio  </>  and  its  conjugate  <p  both  satisfy  the  equation 
x2  =  x  +  1. 


3.2-7 

Prove  by  induction  that  the  i  th  Fibonacci  number  satisfies  the  equality 

_  0/  _ 

V5  ’ 


where  4>  is  the  golden  ratio  and  ([>  is  its  conjugate. 

3.2-8 

Show  that  kink  =  0(/?)  implies  k  =  0(/?/  In n). 


Problems  for  Chapter  3 
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Problems 


3-1  Asymptotic  behavior  of  polynomials 
Let 

d 

P(n)  =  , 

(=0 

where  a d  >  0,  be  a  degree-^  polynomial  in  n,  and  let  k  be  a  constant.  Use  the 
definitions  of  the  asymptotic  notations  to  prove  the  following  properties. 

a.  If  k  >  d,  then  pin)  =  Oink). 

b.  If  k  <  d,  then  pin)  =  £2ink). 

c.  If  k  =  d,  then  p(n)  =  &ink). 

d.  If  k  >  d,  then  pin)  =  oink). 

e.  If  k  <  d,  then  pin)  =  (oink). 


3-2  Relative  asymptotic  growths 

Indicate,  for  each  pair  of  expressions  (/l,  B)  in  the  table  below,  whether  A  is  O,  o, 
£2,  co,  or  0  of  B.  Assume  that  k  >  1,  e  >  0,  and  c  >  1  are  constants.  Your  answer 
should  be  in  the  form  of  the  table  with  “yes”  or  “no”  written  in  each  box. 


A 

B 

0 

0 

Q 

CO 

0 

\gk  n 

n€ 

nk 

c" 

•Jn 

,7sin» 

2" 

2n/2 

n'sc 

clgn 

lg  (»!) 

lg(«") 

3-3  Ordering  by  asymptotic  growth  rates 

a.  Rank  the  following  functions  by  order  of  growth;  that  is,  find  an  arrangement 
gi,g2,---,gw  of  the  functions  satisfying  gx  =  £2(g2),  g2  =  £2(g3),  ..., 
g29  =  f2(g30).  Partition  your  list  into  equivalence  classes  such  that  functions 
fin)  and  g(n)  are  in  the  same  class  if  and  only  if  fin)  =  0(g(«)). 
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Igdg*  n) 

2lg  *n 

(V2)lgn 

77  2 

n\ 

(lg  77  )! 

(| )" 

n3 

lg 2  n 

lg(«!) 

22" 

771/lg" 

In  In  n 

lg  *  n 

n  ■  2n 

77lglg” 

In  77 

1 

2lg" 

(lg  n)lgn 

en 

4ig« 

(77  +  1)! 

yig « 

lg*Gg«) 

2V21sn 

n 

2n 

77  lg  77 

22”+1 

b.  Give  an  example  of  a  single  nonnegative  function  f(n )  such  that  for  all  func¬ 
tions  gi  in)  in  part  (a),  / (n)  is  neither  0(gj(n ))  nor  £2(g;  (n)). 

3-4  Asymptotic  notation  properties 

Let  fin)  and  g(n)  be  asymptotically  positive  functions.  Prove  or  disprove  each  of 
the  following  conjectures. 

a.  f(n )  =  0(g(n))  implies  g(n)  =  0(f(n )). 

b.  f(n )  +  gin)  =  @(min if  in),  gin))). 

c.  f{n )  =  O(gin))  implies  lg(/(«))  =  0(\g(g(n))),  where  lg (g(n))  >  1  and 
fin)  >  1  for  all  sufficiently  large  n. 

d.  fin)  =  Oigin))  implies  =  O  (2gM). 

e.  fin)  =  O  iifin))2). 

f  fin)  =  Oigin))  implies  gin)  =  Q(f(n)). 

g.  fin)  =  ©if in/ 2)). 

h.  fin)  +  oifin))  =  ®ifin)). 

3-5  Variations  on  O  and  S2 

OO 

Some  authors  define  £2  in  a  slightly  different  way  than  we  do;  let’s  use  £2  (read 

00 

“omega  infinity”)  for  this  alternative  definition.  We  say  that  f(n)  —  £2(g(n))  if 
there  exists  a  positive  constant  c  such  that  fin)  >  cgin)  >  0  for  infinitely  many 
integers  n . 

a.  Show  that  for  any  two  functions  / in)  and  gin)  that  are  asymptotically  nonneg- 

OO 

ative,  either  f(n)  =  0(g(n ))  or  f(n)  =  Q(g(n))  or  both,  whereas  this  is  not 

OO 

true  if  we  use  £2  in  place  of  £2 . 
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b.  Describe  the  potential  advantages  and  disadvantages  of  using  £2  instead  of  £2  to 
characterize  the  running  times  of  programs. 

Some  authors  also  define  O  in  a  slightly  different  manner;  let’s  use  O'  for  the 
alternative  definition.  We  say  that  f(n)  =  0'(g(n))  if  and  only  if  \  f(n)\  = 
0(g(n)). 

c.  What  happens  to  each  direction  of  the  “if  and  only  if”  in  Theorem  3.1  if  we 
substitute  O'  for  O  but  still  use  £2? 

Some  authors  define  O  (read  “soft-oh”)  to  mean  O  with  logarithmic  factors  ig¬ 
nored: 

0(g(n))  =  { f(n )  :  there  exist  positive  constants  c,  k,  and  n 0  such  that 
0  <  fin)  <  cg(n)  \gk{n)  for  all  n  >  n0)  . 

d.  Define  £2  and  ©  in  a  similar  manner.  Prove  the  corresponding  analog  to  Theo¬ 
rem  3.1. 

3-6  Iterated  functions 

We  can  apply  the  iteration  operator  *  used  in  the  lg*  function  to  any  monotonically 
increasing  function  f{n)  over  the  reals.  For  a  given  constant  c  e  R,  we  define  the 
iterated  function  /’*  by 

fc*(n)  =  min  {/  >  0  :  f(,\n)  <  c }  , 

which  need  not  be  well  defined  in  all  cases.  In  other  words,  the  quantity  fc*(n )  is 
the  number  of  iterated  applications  of  the  function  /  required  to  reduce  its  argu¬ 
ment  down  to  c  or  less. 

For  each  of  the  following  functions  f(n)  and  constants  c,  give  as  tight  a  bound 


as 

possible  on 

f(n) 

c 

/„*(«) 

a. 

n  —  1 

0 

b. 

lg  n 

1 

c. 

nil 

1 

d. 

n /I 

2 

e. 

-Jn 

2 

/• 

y/tt 

1 

g ■ 

n1'3 

2 

h. 

n  /  lg  n 

2 
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Chapter  notes 

Knuth  [209]  traces  the  origin  of  the  O-notation  to  a  number-theory  text  by  P.  Bach- 
mann  in  1892.  The  o  no  tat  ion  was  invented  by  E.  Landau  in  1909  for  his  discussion 
of  the  distribution  of  prime  numbers.  The  £2  and  0  notations  were  advocated  by 
Knuth  [2 1 3]  to  correct  the  popular,  but  technically  sloppy,  practice  in  the  literature 
of  using  O -notation  for  both  upper  and  lower  bounds.  Many  people  continue  to 
use  the  O-notation  where  the  ©-notation  is  more  technically  precise.  Further  dis¬ 
cussion  of  the  history  and  development  of  asymptotic  notations  appeal's  in  works 
by  Knuth  [209,  213]  and  Brassard  and  Bratley  [54], 

Not  all  authors  define  the  asymptotic  notations  in  the  same  way,  although  the 
various  definitions  agree  in  most  common  situations.  Some  of  the  alternative  def¬ 
initions  encompass  functions  that  are  not  asymptotically  nonnegative,  as  long  as 
then'  absolute  values  are  appropriately  bounded. 

Equation  (3.20)  is  due  to  Robbins  [297].  Other  properties  of  elementary  math¬ 
ematical  functions  can  be  found  in  any  good  mathematical  reference,  such  as 
Abramowitz  and  Stegun  [1]  or  Zwillinger  [362],  or  in  a  calculus  book,  such  as 
Apostol  [18]  or  Thomas  et  al.  [334].  Knuth  [209]  and  Graham,  Knuth,  and  Patash- 
nik  [152]  contain  a  wealth  of  material  on  discrete  mathematics  as  used  in  computer 
science. 
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Divide-and-Conquer 


In  Section  2.3.1,  we  saw  how  merge  sort  serves  as  an  example  of  the  divide-and- 
conquer  paradigm.  Recall  that  in  divide-and-conquer,  we  solve  a  problem  recur¬ 
sively,  applying  three  steps  at  each  level  of  the  recursion: 

Divide  the  problem  into  a  number  of  subproblems  that  are  smaller  instances  of  the 
same  problem. 

Conquer  the  subproblems  by  solving  them  recursively.  If  the  subproblem  sizes  are 
small  enough,  however,  just  solve  the  subproblems  in  a  straightforward  manner. 

Combine  the  solutions  to  the  subproblems  into  the  solution  for  the  original  prob¬ 
lem. 

When  the  subproblems  are  large  enough  to  solve  recursively,  we  call  that  the  recur¬ 
sive  case.  Once  the  subproblems  become  small  enough  that  we  no  longer  recurse, 
we  say  that  the  recursion  “bottoms  out”  and  that  we  have  gotten  down  to  the  base 
case.  Sometimes,  in  addition  to  subproblems  that  are  smaller  instances  of  the  same 
problem,  we  have  to  solve  subproblems  that  are  not  quite  the  same  as  the  original 
problem.  We  consider  solving  such  subproblems  as  paid  of  the  combine  step. 

In  this  chapter,  we  shall  see  more  algorithms  based  on  divide-and-conquer.  The 
first  one  solves  the  maximum-subarray  problem:  it  takes  as  input  an  array  of  num¬ 
bers,  and  it  determines  the  contiguous  subarray  whose  values  have  the  greatest  sum. 
Then  we  shall  see  two  divide-and-conquer  algorithms  for  multiplying  n  x  n  matri¬ 
ces.  One  runs  in  0(«3)  time,  which  is  no  better  than  the  straightforward  method  of 
multiplying  square  matrices.  But  the  other,  Strassen’s  algorithm,  runs  in  (90? 2  81 ) 
time,  which  beats  the  straightforward  method  asymptotically. 

Recurrences 

Recurrences  go  hand  in  hand  with  the  divide-and-conquer  paradigm,  because  they 
give  us  a  natural  way  to  characterize  the  running  times  of  divide-and-conquer  algo¬ 
rithms.  A  recurrence  is  an  equation  or  inequality  that  describes  a  function  in  terms 
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of  its  value  on  smaller  inputs.  For  example,  in  Section  2.3.2  we  described  the 
worst-case  running  time  T(n)  of  the  Merge-Sort  procedure  by  the  recurrence 

0(1)  if  7i  =  1  , 

(4.1) 

27(77/2)  +  0(/r)  if  77  >  1  , 

whose  solution  we  claimed  to  be  T(n)  =  0(77  lg  77 ) . 

Recurrences  can  take  many  forms.  For  example,  a  recursive  algorithm  might 
divide  subproblems  into  unequal  sizes,  such  as  a  2/3-to- 1/3  split.  If  the  divide  and 
combine  steps  take  linear  time,  such  an  algorithm  would  give  rise  to  the  recurrence 
7(77)  =  7(277/3)  +  7(77/3)  +  0(77). 

Subproblems  are  not  necessarily  constrained  to  being  a  constant  fraction  of 
the  original  problem  size.  For  example,  a  recursive  version  of  linear  search 
(see  Exercise  2.1-3)  would  create  just  one  subproblem  containing  only  one  el¬ 
ement  fewer  than  the  original  problem.  Each  recursive  call  would  take  con¬ 
stant  time  plus  the  time  for  the  recursive  calls  it  makes,  yielding  the  recurrence 
7(77)  =  7(77  -  1)  +  0(1). 

This  chapter  offers  three  methods  for  solving  recurrences— that  is,  for  obtaining 
asymptotic  “0”  or  “O”  bounds  on  the  solution: 

•  In  the  substitution  method,  we  guess  a  bound  and  then  use  mathematical  in¬ 
duction  to  prove  our  guess  correct. 

•  The  recursion-tree  method  converts  the  recurrence  into  a  tree  whose  nodes 
represent  the  costs  incurred  at  various  levels  of  the  recursion.  We  use  techniques 
for  bounding  summations  to  solve  the  recurrence. 

•  The  master  method  provides  bounds  for  recurrences  of  the  form 

T(n)  =  aT(n/b)  +  /(«) ,  (4.2) 

where  a  >  1,  b  >  1,  and  f(n)  is  a  given  function.  Such  recurrences  arise 
frequently.  A  recurrence  of  the  form  in  equation  (4.2)  characterizes  a  divide  - 
and-conquer  algorithm  that  creates  a  subproblems,  each  of  which  is  1  /b  the 
size  of  the  original  problem,  and  in  which  the  divide  and  combine  steps  together 
take  / (77)  time. 

To  use  the  master  method,  you  will  need  to  memorize  three  cases,  but  once 
you  do  that,  you  will  easily  be  able  to  determine  asymptotic  bounds  for  many 
simple  recurrences.  We  will  use  the  master  method  to  determine  the  running 
times  of  the  divide-and-conquer  algorithms  for  the  maximum-subarray  problem 
and  for  matrix  multiplication,  as  well  as  for  other  algorithms  based  on  divide- 
and-conquer  elsewhere  in  this  book. 
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Occasionally,  we  shall  see  recurrences  that  are  not  equalities  but  rather  inequal¬ 
ities,  such  as  T(n)  <  2T (n/2)  +  &(n).  Because  such  a  recurrence  states  only 
an  upper  bound  on  T(n),  we  will  couch  its  solution  using  O-notation  rather  than 
©-notation.  Similarly,  if  the  inequality  were  reversed  to  T(n)  >  2T (n/2)  +  ©(«), 
then  because  the  recurrence  gives  only  a  lower  bound  on  T(n),  we  would  use 
£2 -notation  in  its  solution. 


Technicalities  in  recurrences 


In  practice,  we  neglect  certain  technical  details  when  we  state  and  solve  recur¬ 
rences.  For  example,  if  we  call  Merge-Sort  on  n  elements  when  n  is  odd,  we 
end  up  with  subproblems  of  size  [_/? / 2J  and  [n/2].  Neither  size  is  actually  n/2, 
because  n  / 2  is  not  an  integer  when  n  is  odd.  Technically,  the  recurrence  describing 
the  worst-case  running  time  of  Merge- Sort  is  really 


i©(1)  if  n  =  1  , 

(  T( [77/2])  +  T([n/2\)  +  0(77)  if n  >  1  . 


(4.3) 


Boundary  conditions  represent  another  class  of  details  that  we  typically  ignore. 
Since  the  running  time  of  an  algorithm  on  a  constant-sized  input  is  a  constant, 
the  recurrences  that  arise  from  the  running  times  of  algorithms  generally  have 
T(n)  =  0(1)  for  sufficiently  small  n.  Consequently,  for  convenience,  we  shall 
generally  omit  statements  of  the  boundary  conditions  of  recurrences  and  assume 
that  T(n)  is  constant  for  small  n.  For  example,  we  normally  state  recurrence  (4.1) 
as 


T{n)  =  2T(n/2)  +  0(n)  ,  (4.4) 

without  explicitly  giving  values  for  small  n .  The  reason  is  that  although  changing 
the  value  of  T(l)  changes  the  exact  solution  to  the  recurrence,  the  solution  typi¬ 
cally  doesn’t  change  by  more  than  a  constant  factor,  and  so  the  order  of  growth  is 
unchanged. 

When  we  state  and  solve  recurrences,  we  often  omit  floors,  ceilings,  and  bound¬ 
ary  conditions.  We  forge  ahead  without  these  details  and  later  determine  whether 
or  not  they  matter.  They  usually  do  not,  but  you  should  know  when  they  do.  Ex¬ 
perience  helps,  and  so  do  some  theorems  stating  that  these  details  do  not  affect  the 
asymptotic  bounds  of  many  recurrences  characterizing  divide-and-conquer  algo¬ 
rithms  (see  Theorem  4.1).  In  this  chapter,  however,  we  shall  address  some  of  these 
details  and  illustrate  the  fine  points  of  recurrence  solution  methods. 
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4.1  The  maximum-subarray  problem 

Suppose  that  you  been  offered  the  opportunity  to  invest  in  the  Volatile  Chemical 
Corporation.  Like  the  chemicals  the  company  produces,  the  stock  price  of  the 
Volatile  Chemical  Corporation  is  rather  volatile.  You  are  allowed  to  buy  one  unit 
of  stock  only  one  time  and  then  sell  it  at  a  later  date,  buying  and  selling  after  the 
close  of  trading  for  the  day.  To  compensate  for  this  restriction,  you  are  allowed  to 
learn  what  the  price  of  the  stock  will  be  in  the  future.  Your  goal  is  to  maximize 
your  profit.  Figure  4.1  shows  the  price  of  the  stock  over  a  17-day  period.  You 
may  buy  the  stock  at  any  one  time,  stalling  after  day  0,  when  the  price  is  $100 
per  share.  Of  course,  you  would  want  to  “buy  low,  sell  high”— buy  at  the  lowest 
possible  price  and  later  on  sell  at  the  highest  possible  price— to  maximize  your 
profit.  Unfortunately,  you  might  not  be  able  to  buy  at  the  lowest  price  and  then  sell 
at  the  highest  price  within  a  given  period.  In  Figure  4.1,  the  lowest  price  occurs 
after  day  7,  which  occurs  after  the  highest  price,  after  day  1. 

You  might  think  that  you  can  always  maximize  profit  by  either  buying  at  the 
lowest  price  or  selling  at  the  highest  price.  For  example,  in  Figure  4.1,  we  would 
maximize  profit  by  buying  at  the  lowest  price,  after  day  7.  If  this  strategy  always 
worked,  then  it  would  be  easy  to  determine  how  to  maximize  profit:  find  the  highest 
and  lowest  prices,  and  then  work  left  from  the  highest  price  to  find  the  lowest  prior 
price,  work  right  from  the  lowest  price  to  find  the  highest  later  price,  and  take 
the  pair  with  the  greater  difference.  Figure  4.2  shows  a  simple  counterexample, 
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Figure  4.1  Information  about  the  price  of  stock  in  the  Volatile  Chemical  Corporation  after  the  close 
of  trading  over  a  period  of  17  days.  The  horizontal  axis  of  the  chart  indicates  the  day,  and  the  vertical 
axis  shows  the  price.  The  bottom  row  of  the  table  gives  the  change  in  price  from  the  previous  day. 
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Figure  4.2  An  example  showing  that  the  maximum  profit  does  not  always  start  at  the  lowest  price 
or  end  at  the  highest  price.  Again,  the  horizontal  axis  indicates  the  day,  and  the  vertical  axis  shows 
the  price.  Here,  the  maximum  profit  of  $3  per  share  would  be  earned  by  buying  after  day  2  and 
selling  after  day  3.  The  price  of  $7  after  day  2  is  not  the  lowest  price  overall,  and  the  price  of  $10 
after  day  3  is  not  the  highest  price  overall. 

demonstrating  that  the  maximum  profit  sometimes  comes  neither  by  buying  at  the 
lowest  price  nor  by  selling  at  the  highest  price. 

A  brute-force  solution 

We  can  easily  devise  a  brute-force  solution  to  this  problem:  just  try  every  possible 
pair  of  buy  and  sell  dates  in  which  the  buy  date  precedes  the  sell  date.  A  period  of  n 
days  has  (")  such  pairs  of  dates.  Since  (")  is  @(/72),  and  the  best  we  can  hope  for 
is  to  evaluate  each  pair  of  dates  in  constant  time,  this  approach  would  take  Q(n2) 
time.  Can  we  do  better? 

A  transformation 

In  order  to  design  an  algorithm  with  an  o{n2)  running  time,  we  will  look  at  the 
input  in  a  slightly  different  way.  We  want  to  find  a  sequence  of  days  over  which 
the  net  change  from  the  first  day  to  the  last  is  maximum.  Instead  of  looking  at  the 
daily  prices,  let  us  instead  consider  the  daily  change  in  price,  where  the  change  on 
day  i  is  the  difference  between  the  prices  after  day  i  —  1  and  after  day  i .  The  table 
in  Figure  4.1  shows  these  daily  changes  in  the  bottom  row.  If  we  treat  this  row  as 
an  array  A,  shown  in  Figure  4.3,  we  now  want  to  find  the  nonempty,  contiguous 
subarray  of  A  whose  values  have  the  largest  sum.  We  call  this  contiguous  subarray 
the  maximum  subarray.  For  example,  in  the  array  of  Figure  4.3,  the  maximum 
subarray  of  A[\  . .  16]  is  A [8  . .  1 1],  with  the  sum  43.  Thus,  you  would  want  to  buy 
the  stock  just  before  day  8  (that  is,  after  day  7)  and  sell  it  after  day  11,  earning  a 
profit  of  $43  per  share. 

At  first  glance,  this  transformation  does  not  help.  We  still  need  to  check 
(”2 ')  =  O (/? 2 )  subarrays  for  a  period  of  n  days.  Exercise  4.1-2  asks  you  to  show 
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Figure  4.3  The  change  in  stock  prices  as  a  maximum  subarray  problem.  Here,  the  subar 
ray  4[8  . .  1 1],  with  sum  43,  has  the  greatest  sunt  of  any  contiguous  subarray  of  array  A. 

that  although  computing  the  cost  of  one  subarray  might  take  time  proportional  to 
the  length  of  the  subarray,  when  computing  all  Q(//2)  subarray  sums,  we  can  orga¬ 
nize  the  computation  so  that  each  subarray  sum  takes  0(1)  time,  given  the  values 
of  previously  computed  subarray  sums,  so  that  the  brute-force  solution  takes  0(«2) 
time. 

So  let  us  seek  a  more  efficient  solution  to  the  maximum-subarray  problem. 
When  doing  so,  we  will  usually  speak  of  “a”  maximum  subarray  rather  than  “the” 
maximum  subarray,  since  there  could  be  more  than  one  subarray  that  achieves  the 
maximum  sum. 

The  maximum-subarray  problem  is  interesting  only  when  the  array  contains 
some  negative  numbers.  If  all  the  array  entries  were  nonnegative,  then  the 
maximum-subarray  problem  would  present  no  challenge,  since  the  entire  array 
would  give  the  greatest  sum. 

A  solution  using  divide-and-conquer 

Let’s  think  about  how  we  might  solve  the  maximum-subarray  problem  using 
the  divide-and-conquer  technique.  Suppose  we  want  to  find  a  maximum  subar¬ 
ray  of  the  subarray  A[low . .  high],  Divide-and-conquer  suggests  that  we  divide 
the  subarray  into  two  subarrays  of  as  equal  size  as  possible.  That  is,  we  find 
the  midpoint,  say  mid,  of  the  subarray,  and  consider  the  subarrays  A  [ low . .  mid] 
and  A  [mid  +  1 . .  high] .  As  Figure  4.4(a)  shows,  any  contiguous  subarray  A[i  . .  j] 
of  A[Iow . .  high]  must  lie  in  exactly  one  of  the  following  places: 

•  entirely  in  the  subarray  A  [low  . .  mid],  so  that  low  <  i  <  j  <  mid, 

•  entirely  in  the  subarray  A  [mid  +  1  . .  high],  so  that  mid  <  i  <  j  <  high,  or 

•  crossing  the  midpoint,  so  that  low  <  i  <  mid  <  j  <  high. 

Therefore,  a  maximum  subarray  of  A[low  . .  high]  must  lie  in  exactly  one  of  these 
places.  In  fact,  a  maximum  subarray  of  A  [low . .  high]  must  have  the  greatest 
sum  over  all  subarrays  entirely  in  A  [low .  .mid],  entirely  in  A  [mid  +  1  .  .high], 
or  crossing  the  midpoint.  We  can  find  maximum  subarrays  of  A  [low  . .  mid]  and 
A[mid+ 1  -  -  high]  recursively,  because  these  two  subproblems  are  smaller  instances 
of  the  problem  of  finding  a  maximum  subarray.  Thus,  all  that  is  left  to  do  is  find  a 
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crosses  the  midpoint 

low  mid  high  low 


entirely  in  A  [ low . .  mid]  entirely  in  A[mid  +  1 . .  high] 

(a) 

Figure  4.4  (a)  Possible  locations  of  subarrays  of  A  [low . .  high]:  entirely  in  A  [low . .  mid],  entirely 
in  A[mid  +  1  • .  high],  or  crossing  the  midpoint  mid.  (b)  Any  subarray  of  A  [low  .  .high]  crossing 
the  midpoint  comprises  two  subarrays  A[i . . mid]  and  A[mid  +  1 . .  j],  where  low  <  i  <  mid  and 
mid  <  j  <  high. 

maximum  subarray  that  crosses  the  midpoint,  and  take  a  subarray  with  the  largest 
sum  of  the  three. 

We  can  easily  find  a  maximum  subarray  crossing  the  midpoint  in  time  linear 
in  the  size  of  the  subarray  A[low .  .high].  This  problem  is  not  a  smaller  instance 
of  our  original  problem,  because  it  has  the  added  restriction  that  the  subarray  it 
chooses  must  cross  the  midpoint.  As  Figure  4.4(b)  shows,  any  subarray  crossing 
the  midpoint  is  itself  made  of  two  subarrays  A[i . .  mid \  and  A[mid  +  1 . .  j ],  where 
low  <  /  <  mid  and  mid  <  j  <  high.  Therefore,  we  just  need  to  find  maximum 
subarrays  of  the  form  A[i . .  mid]  and  A[mid  +  1 . .  j  ]  and  then  combine  them.  The 
procedure  FIND-MAX-CROSSING-SUBARRAY  takes  as  input  the  array  A  and  the 
indices  low,  mid,  and  high,  and  it  returns  a  tuple  containing  the  indices  demarcating 
a  maximum  subarray  that  crosses  the  midpoint,  along  with  the  sum  of  the  values  in 
a  maximum  subarray. 

Find-Max-Crossing-Subarray  (A,  low,  mid,  high) 

1  left-sum  =  — oo 

2  sum  =  0 

3  for  i  =  mid  downto  low 

4  sum  =  sum  +  A[i] 

5  if  sum  >  left-sum 

6  left-sum  =  sum 

7  max-left  =  i 

8  right-sum  =  —  oo 

9  sum  =  0 

10  for  j  =  mid  +  1  to  high 

1 1  sum  =  sum  +  A[j] 

12  if  sum  >  right-sum 

13  right-sum  =  sum 

14  max-right  =  j 

15  ret  urn  (max-  left ,  max-right ,  left -sum  +  right-sum  ) 
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This  procedure  works  as  follows.  Lines  1-7  find  a  maximum  subarray  of  the 
left  half,  A  [low  . .  mid].  Since  this  subarray  must  contain  A  [mid] ,  the  for  loop  of 
lines  3-7  starts  the  index  i  at  mid  and  works  down  to  low,  so  that  every  subarray 
it  considers  is  of  the  form  A [i  .  .mid].  Lines  1-2  initialize  the  variables  left-sum, 
which  holds  the  greatest  sum  found  so  far,  and  sum,  holding  the  sum  of  the  entries 
in  A[i  . .  mid].  Whenever  we  find,  in  line  5,  a  subarray  A[i  . .  mid]  with  a  sum  of 
values  greater  than  left-sum,  we  update  left-sum  to  this  subarray’s  sum  in  line  6,  and 
in  line  7  we  update  the  variable  max-left  to  record  this  index  i.  Lines  8-14  work 
analogously  for  the  right  half,  A[mid  +  1  . .  high] .  Here,  the  for  loop  of  lines  10-14 
stalls  the  index  j  at  mid+ 1  and  works  up  to  high,  so  that  every  subarray  it  considers 
is  of  the  form  A  [mid  +  1  . .  j].  Finally,  line  15  returns  the  indices  max-left  and 
max-right  that  demarcate  a  maximum  subarray  crossing  the  midpoint,  along  with 
the  sum  left-sum  +  right-sum  of  the  values  in  the  subarray  A  [max-left . .  max-right]. 

If  the  subarray  A  [low  . .  high]  contains  n  entries  (so  that  n  =  high  —  low  +  1), 
we  claim  that  the  call  Find-Max-Crossing-Subarray  (A,  low,  mid,  high) 
takes  0(«)  time.  Since  each  iteration  of  each  of  the  two  for  loops  takes  0(1) 
time,  we  just  need  to  count  up  how  many  iterations  there  are  altogether.  The  for 
loop  of  lines  3-7  makes  mid  —  low  +  1  iterations,  and  the  for  loop  of  lines  10-14 
makes  high  —  mid  iterations,  and  so  the  total  number  of  iterations  is 

(mid  —  low  +  1)  +  (high  —  mid )  =  high  —  low  +  1 

=  n  . 

With  a  linear-time  Find-Max-Crossing-Subarray  procedure  in  hand,  we 
can  write  pseudocode  for  a  divide-and-conquer  algorithm  to  solve  the  maximum- 
subarray  problem: 

Find-Maximum-Subarray(H,  low ,  high ) 

1  if  high  ==  low 

2  return  (low,  high,  A  [low\)  //  base  case:  only  one  element 

3  else  mid  =  [(low  +  high)/ 2\ 

4  (left-low,  left-high,  left-sum)  = 

Find-Maximum-Subarray  (A,  low ,  mid) 

5  (right-low, right-high, right-sum)  — 

Find-Maximum-Subarray  (A,  mid  +  l,  high) 

6  (cross-low ,  cross-high,  cross-sum)  = 

Find-Max-Crossing-Subarray  (A,  low,  mid,  high) 

7  if  left  -sum  >  right-sum  and  left-sum  >  cross-sum 

8  return  (left-low,  left-high,  left-sum) 

9  elseif  right-sum  >  left-sum  and  right-sum  >  cross-sum 

10  return  (right-low,  right-high,  right-sum) 

1 1  else  return  (cross-low ,  cross-high ,  cross-sum ) 
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The  initial  call  Find-Maximum-Subarray(A, 1,  A. length)  will  find  a  maxi¬ 
mum  subarray  of  A[l . . «]. 

Similar  to  Find-Max-Crossing-Subarray,  the  recursive  procedure  Find- 
Maximum-Subarray  returns  a  tuple  containing  the  indices  that  demarcate  a 
maximum  subarray,  along  with  the  sum  of  the  values  in  a  maximum  subarray. 
Line  1  tests  for  the  base  case,  where  the  subarray  has  just  one  element.  A  subar¬ 
ray  with  just  one  element  has  only  one  subarray— itself—  and  so  line  2  returns  a 
tuple  with  the  starting  and  ending  indices  of  just  the  one  element,  along  with  its 
value.  Lines  3-1 1  handle  the  recursive  case.  Line  3  does  the  divide  part,  comput¬ 
ing  the  index  mid  of  the  midpoint.  Let’s  refer  to  the  subarray  A  [low  . .  mid]  as  the 
left  subarray  and  to  A  [mid  +  1  . .  high]  as  the  right  subarray.  Because  we  know 
that  the  subarray  A  [low  . .  high ]  contains  at  least  two  elements,  each  of  the  left  and 
right  subarrays  must  have  at  least  one  element.  Lines  4  and  5  conquer  by  recur¬ 
sively  finding  maximum  subarrays  within  the  left  and  right  subarrays,  respectively. 
Lines  6-1 1  form  the  combine  part.  Line  6  finds  a  maximum  subarray  that  crosses 
the  midpoint.  (Recall  that  because  line  6  solves  a  subproblem  that  is  not  a  smaller 
instance  of  the  original  problem,  we  consider  it  to  be  in  the  combine  part.)  Line  7 
tests  whether  the  left  subarray  contains  a  subarray  with  the  maximum  sum,  and 
line  8  returns  that  maximum  subarray.  Otherwise,  line  9  tests  whether  the  right 
subarray  contains  a  subarray  with  the  maximum  sum,  and  line  10  returns  that  max¬ 
imum  subarray.  If  neither  the  left  nor  right  subarrays  contain  a  subarray  achieving 
the  maximum  sum,  then  a  maximum  subarray  must  cross  the  midpoint,  and  line  1 1 
returns  it. 

Analyzing  the  divide-and-conquer  algorithm 

Next  we  set  up  a  recurrence  that  describes  the  running  time  of  the  recursive  Find- 
Maximum-Subarray  procedure.  As  we  did  when  we  analyzed  merge  sort  in 
Section  2.3.2,  we  make  the  simplifying  assumption  that  the  original  problem  size 
is  a  power  of  2,  so  that  all  subproblem  sizes  are  integers.  We  denote  by  T(n)  the 
running  time  of  Find-Maximum-Subarray  on  a  subarray  of  n  elements.  For 
starters,  line  1  takes  constant  time.  The  base  case,  when  n  =  1,  is  easy:  line  2 
takes  constant  time,  and  so 

7X1)  =  ©(1)  •  (4.5) 

The  recursive  case  occurs  when  n  >  1.  Lines  1  and  3  take  constant  time.  Each 
of  the  subproblems  solved  in  lines  4  and  5  is  on  a  subarray  of  n/2  elements  (our 
assumption  that  the  original  problem  size  is  a  power  of  2  ensures  that  nil  is  an 
integer),  and  so  we  spend  T(n/2)  time  solving  each  of  them.  Because  we  have 
to  solve  two  subproblems— for  the  left  subarray  and  for  the  right  subarray— the 
contribution  to  the  running  time  from  lines  4  and  5  comes  to  2 T (n/2).  As  we  have 
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already  seen,  the  call  to  Find-Max-Crossing-Subarray  in  line  6  takes  ©(77) 
time.  Lines  7-11  take  only  0(1)  time.  For  the  recursive  case,  therefore,  we  have 

T(n)  =  0(1) +  2F(77/2)  + 0(/7) +  0(1) 

=  2T(n/2)  +  0(77)  .  (4.6) 


Combining  equations  (4.5)  and  (4.6)  gives  us  a  recurrence  for  the  running 
time  T{n)  of  Find-Maximum-Subarray: 


Tin )  = 


0(1) 

2T  in  /  2)  +  ©(77) 


if  77  =  1 
if  77  >  1 


(4.7) 


This  recurrence  is  the  same  as  recurrence  (4.1)  for  merge  sort.  As  we  shall 
see  from  the  master  method  in  Section  4.5,  this  recurrence  has  the  solution 
Tin)  =  0(77  lg  77 ) .  You  might  also  revisit  the  recursion  tree  in  Figure  2.5  to  un¬ 
derstand  why  the  solution  should  be  T{n)  =  0(77  lg 77 ) . 

Thus,  we  see  that  the  divide-and-conquer  method  yields  an  algorithm  that  is 
asymptotically  faster  than  the  brute-force  method.  With  merge  sort  and  now  the 
maximum-subarray  problem,  we  begin  to  get  an  idea  of  how  powerful  the  divide- 
and-conquer  method  can  be.  Sometimes  it  will  yield  the  asymptotically  fastest 
algorithm  for  a  problem,  and  other  times  we  can  do  even  better.  As  Exercise  4.1-5 
shows,  there  is  in  fact  a  linear-time  algorithm  for  the  maximum-subarray  problem, 
and  it  does  not  use  divide-and-conquer. 


Exercises 


4.1-1 

What  does  Find-Maximum-Subarray  return  when  all  elements  of  A  are  nega¬ 
tive? 


4.1-2 

Write  pseudocode  for  the  brute-force  method  of  solving  the  maximum-subarray 
problem.  Your  procedure  should  run  in  0(772)  time. 


4.1-3 

Implement  both  the  brute-force  and  recursive  algorithms  for  the  maximum- 
subarray  problem  on  your  own  computer.  What  problem  size  n  0  gives  the  crossover 
point  at  which  the  recursive  algorithm  beats  the  brute-force  algorithm?  Then, 
change  the  base  case  of  the  recursive  algorithm  to  use  the  brute-force  algorithm 
whenever  the  problem  size  is  less  than  n0.  Does  that  change  the  crossover  point? 


4.1-4 

Suppose  we  change  the  definition  of  the  maximum-subarray  problem  to  allow  the 
result  to  be  an  empty  subarray,  where  the  sum  of  the  values  of  an  empty  subar- 
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ray  is  0.  How  would  you  change  any  of  the  algorithms  that  do  not  allow  empty 
subarrays  to  permit  an  empty  subarray  to  be  the  result? 


4.1-5 

Use  the  following  ideas  to  develop  a  nonrecursive,  linear-time  algorithm  for  the 
maximum-subarray  problem.  Staid  at  the  left  end  of  the  array,  and  progress  toward 
the  right,  keeping  track  of  the  maximum  subarray  seen  so  far.  Knowing  a  maximum 
subarray  of  A[l . .  j ],  extend  the  answer  to  find  a  maximum  subarray  ending  at  in¬ 
dex  j  + 1  by  using  the  following  observation:  a  maximum  subarray  of  A[l . .  j  +  1] 
is  either  a  maximum  subarray  of  A[l . .  j ]  or  a  subarray  A[i  . .  j  +  1],  for  some 
1  <  i  <  j  +  1.  Determine  a  maximum  subarray  of  the  form  A[i  . .  j  +  1]  in 
constant  time  based  on  knowing  a  maximum  subarray  ending  at  index  j . 


4.2  Strassen ’s  algorithm  for  matrix  multiplication 

If  you  have  seen  matrices  before,  then  you  probably  know  how  to  multiply  them. 
(Otherwise,  you  should  read  Section  D.l  in  Appendix  D.)  If  A  =  (a,y)  and 
B  =  (bij)  are  square  n  x  n  matrices,  then  in  the  product  C  =  A  -  B,  we  define  the 
entry  c,y ,  for  i.  j  —  1,2,...,  n,  by 

n 

cij  =  ^2  aik  ■  bkJ  .  (4.8) 

k=  1 

We  must  compute  n2  matrix  entries,  and  each  is  the  sum  of  n  values.  The  following 
procedure  takes  n  x  n  matrices  A  and  B  and  multiplies  them,  returning  their  n  x  n 
product  C .  We  assume  that  each  matrix  has  an  attribute  rows,  giving  the  number 
of  rows  in  the  matrix. 

Square-Matrix-Multiply(A,  B) 

1  n  —  A.  rows 

2  let  C  be  a  new  n  x  n  matrix 

3  for  i  =  1  to  n 

4  for  j  =  1  to  n 

5  Cij  =  0 

6  for  k  =  1  to  n 

2  Cij  —  Cij  +  (i  /  k  •  bkj 

8  return  C 

The  Square-Matrix-Multiply  procedure  works  as  follows.  The  for  loop 
of  lines  3-7  computes  the  entries  of  each  row  i,  and  within  a  given  row  i,  the 
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for  loop  of  lines  4-7  computes  each  of  the  entries  Cy ,  for  each  column  j .  Line  5 
initializes  Cy  to  0  as  we  stall  computing  the  sum  given  in  equation  (4.8),  and  each 
iteration  of  the  for  loop  of  lines  6-7  adds  in  one  more  term  of  equation  (4.8). 

Because  each  of  the  triply-nested  for  loops  runs  exactly  n  iterations,  and  each 
execution  of  line  7  takes  constant  time,  the  Square-Matrix-Multiply  proce¬ 
dure  takes  Q(«3)  time. 

You  might  at  first  think  that  any  matrix  multiplication  algorithm  must  take  Q,(n3) 
time,  since  the  natural  definition  of  matrix  multiplication  requires  that  many  mul¬ 
tiplications.  You  would  be  incorrect,  however:  we  have  a  way  to  multiply  matrices 
in  o(n3)  time.  In  this  section,  we  shall  see  Strassen’s  remarkable  recursive  algo¬ 
rithm  for  multiplying  n  x  n  matrices.  It  runs  in  0(«lg7)  time,  which  we  shall  show 
in  Section  4.5.  Since  lg  7  lies  between  2.80  and  2.81,  Strassen’s  algorithm  runs  in 
0(n2  S1 )  time,  which  is  asymptotically  better  than  the  simple  Square-Matrix- 
Multiply  procedure. 

A  simple  divide-and-conquer  algorithm 

To  keep  things  simple,  when  we  use  a  divide-and-conquer  algorithm  to  compute 
the  matrix  product  C  =  A  ■  B,  we  assume  that  n  is  an  exact  power  of  2  in  each  of 
the  n  x  n  matrices.  We  make  this  assumption  because  in  each  divide  step,  we  will 
divide  n  x  n  matrices  into  four  n/2  x  n/2  matrices,  and  by  assuming  that  n  is  an 
exact  power  of  2,  we  are  guaranteed  that  as  long  as  n  >  2,  the  dimension  n/2  is  an 
integer. 

Suppose  that  we  partition  each  of  A,  B,  and  C  into  four  n/2  x  n/2  matrices 


so  that  we  rewrite  the  equation  C  —  A  ■  B  as 


(4.10) 


Equation  (4. 10)  corresponds  to  the  four  equations 


C ii  =  An  •  B n  +  Ai2  ■  B2\  , 

Cl2  =  An  •  B 12  +  A 12  ■  B2 2  , 

C21  =  A2 1  •  B n  +  A2 2  ■  B2 1  , 

C22  =  All  *  ®12  +  a2 2  ■  b22  ■ 


(4.11) 

(4.12) 

(4.13) 

(4.14) 


Each  of  these  four  equations  specifies  two  multiplications  of  «/2  x  n/2  matrices 
and  the  addition  of  their  n /2  x  n /2  products.  We  can  use  these  equations  to  create 
a  straightforward,  recursive,  divide-and-conquer  algorithm: 
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Square-Matrix-Multiply-Recursive (A.  B) 

1  n  =  A.  rows 

2  let  C  be  a  new  n  x  n  matrix 

3  if  n  ==  1 

4  Cn  =  an  -bn 

5  else  partition  A,  B,  and  C  as  in  equations  (4.9) 

6  Cn  =  Square-Matrix -Multiply- Recursive (An,  Bn) 

+  Square-Matrix-Multiply-Recursive(A12,  B21) 

7  C 12  =  Square-Matrix-Multiply-  Recursive  (An,  S12) 

+  Square-Matrix-Multiply-Recursive(A12,  B22 ) 

8  C2i  =  Square-M  atrix-Multiply-  Recursive  (A2l,  Bn) 

+  Square-Matrix-Multiply-Recursive(A22,  B2i) 

9  C22  =  Square-Matrix-Multiply-Recursive(A21,  B12) 

+  Square-Matrix-Multiply-Recursive(A22,  B22) 

10  return  C 

This  pseudocode  glosses  over  one  subtle  but  important  implementation  detail. 
How  do  we  partition  the  matrices  in  line  5?  If  we  were  to  create  12  new  n/2xn/2 
matrices,  we  would  spend  0(«2)  time  copying  entries.  In  fact,  we  can  partition 
the  matrices  without  copying  entries.  The  trick  is  to  use  index  calculations.  We 
identify  a  submatrix  by  a  range  of  row  indices  and  a  range  of  column  indices  of 
the  original  matrix.  We  end  up  representing  a  submatrix  a  little  differently  from 
how  we  represent  the  original  matrix,  which  is  the  subtlety  we  are  glossing  over. 
The  advantage  is  that,  since  we  can  specify  submatrices  by  index  calculations, 
executing  line  5  takes  only  0(1)  time  (although  we  shall  see  that  it  makes  no 
difference  asymptotically  to  the  overall  running  time  whether  we  copy  or  partition 
in  place). 

Now,  we  derive  a  recurrence  to  characterize  the  running  time  of  SQUARE- 
Matrix-Multiply-Recursive.  Let  T(n)  be  the  time  to  multiply  two  n  x  n 
matrices  using  this  procedure.  In  the  base  case,  when  n  —  1,  we  perform  just  the 
one  scalar  multiplication  in  line  4,  and  so 

T(l)  =  0(1)  .  (4.15) 

The  recursive  case  occurs  when  n  >  1 .  As  discussed,  partitioning  the  matrices  in 
line  5  takes  0(1)  time,  using  index  calculations.  In  lines  6-9,  we  recursively  call 
Square-Matrix-Multiply-Recursive  a  total  of  eight  times.  Because  each 
recursive  call  multiplies  two  n/2x  n/2  matrices,  thereby  contributing  T (n / 2)  to 
the  overall  running  time,  the  time  taken  by  all  eight  recursive  calls  is  ST  (n/2).  We 
also  must  account  for  the  four  matrix  additions  in  lines  6-9.  Each  of  these  matrices 
contains  n2 / 4  entries,  and  so  each  of  the  four  matrix  additions  takes  0(«2)  time. 
Since  the  number  of  matrix  additions  is  a  constant,  the  total  time  spent  adding  ma- 
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trices  in  lines  6-9  is  0(n2).  (Again,  we  use  index  calculations  to  place  the  results 
of  the  matrix  additions  into  the  correct  positions  of  matrix  C,  with  an  overhead 
of  0(1)  time  per  entry.)  The  total  time  for  the  recursive  case,  therefore,  is  the  sum 
of  the  partitioning  time,  the  time  for  all  the  recursive  calls,  and  the  time  to  add  the 
matrices  resulting  from  the  recursive  calls: 

T(n)  =  0(1)  +  87>/2)  +  0(»2) 

=  ST(n/2)  +  0(«2)  .  (4.16) 


Notice  that  if  we  implemented  partitioning  by  copying  matrices,  which  would  cost 
0(/j2)  time,  the  recurrence  would  not  change,  and  hence  the  overall  running  time 
would  increase  by  only  a  constant  factor. 

Combining  equations  (4.15)  and  (4.16)  gives  us  the  recurrence  for  the  running 
time  of  Square-Matrix-Multiply-Recursive: 


T(n)  =  I  0(1)  if"  =  l. 

I  8T(n/2)  +  0(n2)  if  n  >  1  . 


(4.17) 


As  we  shall  see  from  the  master  method  in  Section  4.5,  recurrence  (4.17)  has  the 
solution  T(n)  =  0(/?3).  Thus,  this  simple  divide-and-conquer  approach  is  no 
faster  than  the  straightforward  Square-Matrix-Multiply  procedure. 

Before  we  continue  on  to  examining  Strassen’s  algorithm,  let  us  review  where 
the  components  of  equation  (4.16)  came  from.  Partitioning  each  n  x  n  matrix  by 
index  calculation  takes  0(1)  time,  but  we  have  two  matrices  to  partition.  Although 
you  could  say  that  partitioning  the  two  matrices  takes  0(2)  time,  the  constant  of  2 
is  subsumed  by  the  0-notation.  Adding  two  matrices,  each  with,  say,  k  entries, 
takes  &(k)  time.  Since  the  matrices  we  add  each  have  n2/4  entries,  you  could 
say  that  adding  each  pair  takes  0(»2/4)  time.  Again,  however,  the  0-notation 
subsumes  the  constant  factor  of  1/4,  and  we  say  that  adding  two  «2/4  x  n2 / 4 
matrices  takes  0(«2)  time.  We  have  four  such  matrix  additions,  and  once  again, 
instead  of  saying  that  they  take  0(4/?2)  time,  we  say  that  they  take  0(«2)  time. 
(Of  course,  you  might  observe  that  we  could  say  that  the  four  matrix  additions 
take  0(4«2/4)  time,  and  that  4n2/4  =  n2,  but  the  point  here  is  that  0-notation 
subsumes  constant  factors,  whatever  they  are.)  Thus,  we  end  up  with  two  terms 
of  0(n2),  which  we  can  combine  into  one. 

When  we  account  for  the  eight  recursive  calls,  however,  we  cannot  just  sub¬ 
sume  the  constant  factor  of  8.  In  other  words,  we  must  say  that  together  they  take 
8T(«/2)  time,  rather  than  just  T (n/2)  time.  You  can  get  a  feel  for  why  by  looking 
back  at  the  recursion  tree  in  Figure  2.5,  for  recurrence  (2.1)  (which  is  identical  to 
recurrence  (4.7)),  with  the  recursive  case  T(n)  =  2 T (n/2)  +  0(«).  The  factor  of  2 
determined  how  many  children  each  tree  node  had,  which  in  turn  determined  how 
many  terms  contributed  to  the  sum  at  each  level  of  the  tree.  If  we  were  to  ignore 
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the  factor  of  8  in  equation  (4.16)  or  the  factor  of  2  in  recurrence  (4.1),  the  recursion 
tree  would  just  be  linear,  rather  than  “bushy,”  and  each  level  would  contribute  only 
one  term  to  the  sum. 

Bear  in  mind,  therefore,  that  although  asymptotic  notation  subsumes  constant 
multiplicative  factors,  recursive  notation  such  as  T(n/2 )  does  not. 

Strassen ’s  method 

The  key  to  Strassen’s  method  is  to  make  the  recursion  tree  slightly  less  bushy.  That 
is,  instead  of  performing  eight  recursive  multiplications  of  n/2  x  n/2  matrices, 
it  performs  only  seven.  The  cost  of  eliminating  one  matrix  multiplication  will  be 
several  new  additions  of  n/2  x  n/2  matrices,  but  still  only  a  constant  number  of 
additions.  As  before,  the  constant  number  of  matrix  additions  will  be  subsumed 
by  0-notation  when  we  set  up  the  recurrence  equation  to  characterize  the  running 
time. 

Strassen’s  method  is  not  at  all  obvious.  (This  might  be  the  biggest  understate¬ 
ment  in  this  book.)  It  has  four  steps: 

1 .  Divide  the  input  matrices  A  and  B  and  output  matrix  C  into  n/2  x  n/2  subma¬ 
trices,  as  in  equation  (4.9).  This  step  takes  0(1)  time  by  index  calculation,  just 
as  in  Square-Matrix-Multiply-Recursive. 

2.  Create  10  matrices  Si,  S2, ....  S10,  each  of  which  is  n/2  x  n/2  and  is  the  sum 
or  difference  of  two  matrices  created  in  step  1.  We  can  create  all  10  matrices  in 
0(n2)  time. 

3.  Using  the  submatrices  created  in  step  1  and  the  10  matrices  created  in  step  2, 
recursively  compute  seven  matrix  products  Px,  P2, . . . ,  P2.  Each  matrix  P,  is 
n/2  x  n/2. 

4.  Compute  the  desired  submatrices  Cn,  C\2,  C2\,  C22  of  the  result  matrix  C  by 
adding  and  subtracting  various  combinations  of  the  P,  matrices.  We  can  com¬ 
pute  all  four  submatrices  in  0(n2)  time. 

We  shall  see  the  details  of  steps  2-4  in  a  moment,  but  we  already  have  enough 
information  to  set  up  a  recurrence  for  the  running  time  of  Strassen’s  method.  Let  us 
assume  that  once  the  matrix  size  n  gets  down  to  1 ,  we  perform  a  simple  scalar  mul¬ 
tiplication,  just  as  in  line  4  of  Square-Matrix-Multiply-Recursive.  When 
n  >  1,  steps  1,  2,  and  4  take  a  total  of  0(n2)  time,  and  step  3  requires  us  to  per¬ 
form  seven  multiplications  of  n/2  x  n/2  matrices.  Hence,  we  obtain  the  following 
recurrence  for  the  running  time  T (n  )  of  Strassen’s  algorithm: 


T{n) 


0(1) 


if  n  =  1  , 


(4.18) 


lT(n/2)  +  &(n2)  if  n  >  1  . 
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We  have  traded  off  one  matrix  multiplication  for  a  constant  number  of  matrix  ad¬ 
ditions.  Once  we  understand  recurrences  and  their  solutions,  we  shall  see  that  this 
tradeoff  actually  leads  to  a  lower  asymptotic  running  time.  By  the  master  method 
in  Section  4.5,  recurrence  (4.18)  has  the  solution  T(n)  =  0(/zlg7). 

We  now  proceed  to  describe  the  details.  In  step  2,  we  create  the  following  10 
matrices: 


St 

=  B\2 

B22 

S2 

^4 1 1 

■ 

A 12 

S3 

=  a2  1 

+ 

A  22 

S4 

=  P21 

Bn 

S5 

=  Au 

■ 

4  22 

S6 

=  Bn 

+ 

B22 

s7 

=  A 12 

- 

A  22 

00 

CO 

=  P21 

+ 

B22 

s9 

=  Au 

42i 

S10 

=  Bn 

■ 

B\2 

Since  we  must  add  or  subtract  n/2  x  n/2  matrices  10  times,  this  step  does  indeed 
take  0(n2)  time. 

In  step  3,  we  recursively  multiply  n/2 x  n/2  matrices  seven  times  to  compute  the 
following  n/2  x  n/2  matrices,  each  of  which  is  the  sum  or  difference  of  products 
of  A  and  B  submatrices: 


Pi  = 

-  An 

■Si  = 

-  An  ' 

■  P12 

^4 1 1  * 

B22 

1 

P2  = 

=  S2- 

B22  = 

-  An  ' 

■  b22 

■ 

4 12 

•  b22 

5 

P.3  = 

=  .S'3- 

Bn  = 

-  a2  1  ■ 

Bn 

+ 

4  22 

■Bn 

, 

P\  ~~ 

-  422 

■s4  -- 

=  A2 2  • 

■  B2i 

4  22  ■ 

Bn 

P5  = 

=  .s;5  ■ 

Se 

=  4n  • 

Bn 

■ 

An 

•  b22 

+  422 

■Bn 

+  422 

•  b22 

Pe  = 

=  .s7- 

s8  = 

=  412  ■ 

■  B2\ 

+ 

A 12 

•  P22 

—  422  ■ 

B21 

—  422  ■ 

B22 

Pi  = 

=  s9- 

s10  = 

=  An  • 

■Bn 

+ 

An 

•  Bi2 

—  42i  ■ 

Bn 

—  A21  ■ 

B12 

Note  that  the  only  multiplications  we  need  to  perform  are  those  in  the  middle  col¬ 
umn  of  the  above  equations.  The  right-hand  column  just  shows  what  these  products 
equal  in  terms  of  the  original  submatrices  created  in  step  1 . 

Step  4  adds  and  subtracts  the  P,  matrices  created  in  step  3  to  construct  the  four 
n  /2  x  n  /  2  submatrices  of  the  product  C .  We  start  with 


C ii  —  Ps  +  Pa  ~  Pi  +  Pb  ■ 
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Expanding  out  the  right-hand  side,  with  the  expansion  of  each  P,  on  its  own  line 
and  vertically  aligning  terms  that  cancel  out,  we  see  that  Cn  equals 


An- Bn 


+  Ai  1  '  B22  +  A22  •  B ii  +  A22  ■  B22 

—  A22  Bn  +  A22  ■  B2i 

—  AnmB22  -Ai2-B22 

—  A22  ■  B2 2  —  A 22  ■  B 21  +  ^4 12  •  B 22  +  4 12  ■  B 21 


^4 1 1  '^n  +  4 12  ■  B2i  , 

which  corresponds  to  equation  (4. 11). 

Similarly,  we  set 

C12  =  Pi  +  B2  , 

and  so  Ci2  equals 

An'  B\2  —  An  ■  B22 

+  A 1 1  •  B22  +  A 12  ■  B22 

An -B 12  JrAi2-B22, 

corresponding  to  equation  (4.12). 

Setting 

C2  \  —  P3  d-  P4 

makes  C2i  equal 

A2 1  ■  Bn  +  A22  •  P11 

—  A 22  •  P11  +  422  •  B2i 

A2\'Bh  A22'  B2 1  , 

corresponding  to  equation  (4.13). 

Finally,  we  set 
C22  —  P5  +  Pi  ~  P3  ~  Pi  > 
so  that  C22  equals 
An-Bn  +  Ai 

1  *  B22  +  A22  ■  Bn  +  A22  '  B22 
~An'B22  +  An  ■  Bn 

—  A22  ■  Bn  —  A2i  ■  Bn 

—  An  ■  Bn  —  An  -  B 12  +  A2i  -  Sn  +  A2i  ■  Bi2 

A  22  ■  B22 


+  A2 1  ■  B 12  , 
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which  corresponds  to  equation  (4.14).  Altogether,  we  add  or  subtract  nil  x/i/2 
matrices  eight  times  in  step  4,  and  so  this  step  indeed  takes  <d(n2)  time. 

Thus,  we  see  that  Strassen’s  algorithm,  comprising  steps  1-4,  produces  the  cor¬ 
rect  matrix  product  and  that  recurrence  (4.18)  characterizes  its  running  time.  Since 
we  shall  see  in  Section  4.5  that  this  recurrence  has  the  solution  T (n )  =  0(nlg7), 
Strassen’s  method  is  asymptotically  faster  than  the  straightforward  SQUARE- 
Matrix-Multiply  procedure.  The  notes  at  the  end  of  this  chapter  discuss  some 
of  the  practical  aspects  of  Strassen’s  algorithm. 

Exercises 

Note:  Although  Exercises  4.2-3,  4.2-4,  and  4.2-5  are  about  variants  on  Strassen’s 
algorithm,  you  should  read  Section  4.5  before  trying  to  solve  them. 


4.2-1 

Use  Strassen’s  algorithm  to  compute  the  matrix  product 


Show  your  work. 


4.2-2 

Write  pseudocode  for  Strassen’s  algorithm. 


4.2-3 

How  would  you  modify  Strassen’s  algorithm  to  multiply  n  x  n  matrices  in  which  n 
is  not  an  exact  power  of  2?  Show  that  the  resulting  algorithm  runs  in  time  0(«lg7). 


4.2-4 

What  is  the  largest  k  such  that  if  you  can  multiply  3x3  matrices  using  k  multi¬ 
plications  (not  assuming  commutativity  of  multiplication),  then  you  can  multiply 
n  x  n  matrices  in  time  o(nlg7)?  What  would  the  running  time  of  this  algorithm  be? 


4.2-5 

V.  Pan  has  discovered  a  way  of  multiplying  68  x  68  matrices  using  132,464  mul¬ 
tiplications,  a  way  of  multiplying  70  x  70  matrices  using  143,640  multiplications, 
and  a  way  of  multiplying  72  x  72  matrices  using  155,424  multiplications.  Which 
method  yields  the  best  asymptotic  running  time  when  used  in  a  divide-and-conquer 
matrix-multiplication  algorithm?  How  does  it  compare  to  Strassen’s  algorithm? 


4.3  The  substitution  method  for  solving  recurrences 


83 


4.2- 6 

How  quickly  can  you  multiply  a  knxn  matrix  by  an  n  xkn  matrix,  using  Strassen’s 
algorithm  as  a  subroutine?  Answer  the  same  question  with  the  order  of  the  input 
matrices  reversed. 

4.2- 7 

Show  how  to  multiply  the  complex  numbers  a  +  bi  and  c  +  di  using  only  three 
multiplications  of  real  numbers.  The  algorithm  should  take  a,  b,  c,  and  d  as  input 
and  produce  the  real  component  ac  —  bd  and  the  imaginary  component  ad  +  be 
separately. 


4.3  The  substitution  method  for  solving  recurrences 

Now  that  we  have  seen  how  recurrences  characterize  the  running  times  of  divide  - 
and-conquer  algorithms,  we  will  learn  how  to  solve  recurrences.  We  start  in  this 
section  with  the  “substitution”  method. 

The  substitution  method  for  solving  recurrences  comprises  two  steps: 

1 .  Guess  the  form  of  the  solution. 

2.  Use  mathematical  induction  to  find  the  constants  and  show  that  the  solution 
works. 

We  substitute  the  guessed  solution  for  the  function  when  applying  the  inductive 
hypothesis  to  smaller  values;  hence  the  name  “substitution  method.”  This  method 
is  powerful,  but  we  must  be  able  to  guess  the  form  of  the  answer  in  order  to  apply  it. 

We  can  use  the  substitution  method  to  establish  either  upper  or  lower  bounds  on 
a  recurrence.  As  an  example,  let  us  determine  an  upper  bound  on  the  recurrence 

T(n)  =  2T([n/2\)  +  n i,  (4.19) 

which  is  similar  to  recurrences  (4.3)  and  (4.4).  We  guess  that  the  solution  is 
T(n)  =  0{n  lg  n).  The  substitution  method  requires  us  to  prove  that  T (n)  < 
cn  lg  n  for  an  appropriate  choice  of  the  constant  c  >  0.  We  start  by  assuming 
that  this  bound  holds  for  ah  positive  m  <  n,  in  particular  for  m  =  \_n/2 J ,  yielding 
T(\n/ 2J)  <  c  [n/2\  lg([y?/2J).  Substituting  into  the  recurrence  yields 

T(n)  <  2 (c  [n/2\  lg(|n/2J))  +  n 
<  cn\g(n /2)  +  n 
=  cn  lg  n  —  cn  lg  2  +  n 
=  cn  lg  n  —  cn  +  n 
cn  lg  n  , 


< 
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where  the  last  step  holds  as  long  as  c  >  1 . 

Mathematical  induction  now  requires  us  to  show  that  our  solution  holds  for  the 
boundary  conditions.  Typically,  we  do  so  by  showing  that  the  boundary  condi¬ 
tions  are  suitable  as  base  cases  for  the  inductive  proof.  For  the  recurrence  (4.19), 
we  must  show  that  we  can  choose  the  constant  c  large  enough  so  that  the  bound 
T(n  )  <  cn  lg  n  works  for  the  boundary  conditions  as  well.  This  requirement 
can  sometimes  lead  to  problems.  Let  us  assume,  for  the  sake  of  argument,  that 
7(1)  =  I  is  the  sole  boundary  condition  of  the  recurrence.  Then  for  n  =  1,  the 
bound  T{n)  <  cn  lg n  yields  7(1)  <  cl  lg  1  =0,  which  is  at  odds  with  7(1)  =  1. 
Consequently,  the  base  case  of  our  inductive  proof  fails  to  hold. 

We  can  overcome  this  obstacle  in  proving  an  inductive  hypothesis  for  a  spe¬ 
cific  boundary  condition  with  only  a  little  more  effort.  In  the  recurrence  (4.19), 
for  example,  we  take  advantage  of  asymptotic  notation  requiring  us  only  to  prove 
Tin)  <  cn  lg n  for  n  >  no,  where  n 0  is  a  constant  that  we  get  to  choose.  We 
keep  the  troublesome  boundary  condition  7(1)  =  1,  but  remove  it  from  consid¬ 
eration  in  the  inductive  proof.  We  do  so  by  first  observing  that  for  n  >  3,  the 
recurrence  does  not  depend  directly  on  7(1).  Thus,  we  can  replace  7(1)  by  7(2) 
and  7(3)  as  the  base  cases  in  the  inductive  proof,  letting  n0  =  2.  Note  that  we 
make  a  distinction  between  the  base  case  of  the  recurrence  (n  =  1)  and  the  base 
cases  of  the  inductive  proof  (n  =  2  and  n  =  3).  With  7(1)  =  1,  we  derive  from 
the  recurrence  that  7(2)  =  4  and  7 (3)  =  5.  Now  we  can  complete  the  inductive 
proof  that  T{n)  <  cn  Ig  n  for  some  constant  c  >  1  by  choosing  c  large  enough 
so  that  7(2)  <  c21g2  and  7(3)  <  c31g3.  As  it  turns  out,  any  choice  of  c  >  2 
suffices  for  the  base  cases  of  n  =  2  and  n  =  3  to  hold.  For  most  of  the  recurrences 
we  shall  examine,  it  is  straightforward  to  extend  boundary  conditions  to  make  the 
inductive  assumption  work  for  small  n ,  and  we  shall  not  always  explicitly  work  out 
the  details. 

Making  a  good  guess 

Unfortunately,  there  is  no  general  way  to  guess  the  correct  solutions  to  recurrences. 
Guessing  a  solution  takes  experience  and,  occasionally,  creativity.  Fortunately, 
though,  you  can  use  some  heuristics  to  help  you  become  a  good  guesser.  You 
can  also  use  recursion  trees,  which  we  shall  see  in  Section  4.4,  to  generate  good 
guesses. 

If  a  recurrence  is  similar  to  one  you  have  seen  before,  then  guessing  a  similar 
solution  is  reasonable.  As  an  example,  consider  the  recurrence 

T{n)  =  27(|n/2J  +  17)  +  n  , 

which  looks  difficult  because  of  the  added  “  1 7”  in  the  argument  to  7  on  the  right- 
hand  side.  Intuitively,  however,  this  additional  term  cannot  substantially  affect  the 
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solution  to  the  recurrence.  When  n  is  large,  the  difference  between  [n/ 2J  and 
\n/2\  +  17  is  not  that  large:  both  cut  n  nearly  evenly  in  half.  Consequently,  we 
make  the  guess  that  T(n)  =  0(n  lg  n),  which  you  can  verify  as  correct  by  using 
the  substitution  method  (see  Exercise  4.3-6). 

Another  way  to  make  a  good  guess  is  to  prove  loose  upper  and  lower  bounds  on 
the  recurrence  and  then  reduce  the  range  of  uncertainty.  For  example,  we  might 
staid  with  a  lower  bound  of  T(n)  =  Q.(n)  for  the  recurrence  (4.19),  since  we 
have  the  term  n  in  the  recurrence,  and  we  can  prove  an  initial  upper  bound  of 
T(n )  =  0(n2).  Then,  we  can  gradually  lower  the  upper  bound  and  raise  the 
lower  bound  until  we  converge  on  the  correct,  asymptotically  tight  solution  of 
T(n)  =  0(«  lgn). 

Subtleties 

Sometimes  you  might  correctly  guess  an  asymptotic  bound  on  the  solution  of  a 
recurrence,  but  somehow  the  math  fails  to  work  out  in  the  induction.  The  problem 
frequently  turns  out  to  be  that  the  inductive  assumption  is  not  strong  enough  to 
prove  the  detailed  bound.  If  you  revise  the  guess  by  subtracting  a  lower-order  term 
when  you  hit  such  a  snag,  the  math  often  goes  through. 

Consider  the  recurrence 

Tin)  =  T(L«/2J)  +  T(\n/2])  +  1  . 

We  guess  that  the  solution  is  T(n)  =  0(n),  and  we  try  to  show  that  T in )  <  cn  for 
an  appropriate  choice  of  the  constant  c.  Substituting  our  guess  in  the  recurrence, 
we  obtain 

T(n)  <  c  [n/2\  +  c  \nj 2]  +  1 
=  cn  +  1  , 

which  does  not  imply  T(n)  <  cn  for  any  choice  of  c.  We  might  be  tempted  to  try 
a  larger  guess,  say  T(n)  =  0(n2).  Although  we  can  make  this  larger  guess  work, 
our  original  guess  of  T(n)  =  Oin)  is  correct.  In  order  to  show  that  it  is  correct, 
however,  we  must  make  a  stronger  inductive  hypothesis. 

Intuitively,  our  guess  is  nearly  right:  we  are  off  only  by  the  constant  1,  a 
lower-order  term.  Nevertheless,  mathematical  induction  does  not  work  unless  we 
prove  the  exact  form  of  the  inductive  hypothesis.  We  overcome  our  difficulty 
by  subtracting  a  lower-order  term  from  our  previous  guess.  Our  new  guess  is 
T{n)  <  cn  —  d,  where  d  >  0  is  a  constant.  We  now  have 

Tin)  <  (c  [n/2\  -  d)  +  (c  \n/2]  -  d)  +  \ 

=  cn  —  2d  T  1 
<  cn  —  d  , 
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as  long  as  d  >  1 .  As  before,  we  must  choose  the  constant  c  large  enough  to  handle 
the  boundary  conditions. 

You  might  find  the  idea  of  subtracting  a  lower-order  term  counterintuitive.  Af¬ 
ter  all,  if  the  math  does  not  work  out,  we  should  increase  our  guess,  right? 
Not  necessarily !  When  proving  an  upper  bound  by  induction,  it  may  actually  be 
more  difficult  to  prove  that  a  weaker  upper  bound  holds,  because  in  order  to  prove 
the  weaker  bound,  we  must  use  the  same  weaker  bound  inductively  in  the  proof. 
In  our  current  example,  when  the  recurrence  has  more  than  one  recursive  term,  we 
get  to  subtract  out  the  lower-order  term  of  the  proposed  bound  once  per  recursive 
term.  In  the  above  example,  we  subtracted  out  the  constant  d  twice,  once  for  the 
T(\_n/2\)  term  and  once  for  the  T(\n/2~\)  term.  We  ended  up  with  the  inequality 
T{n)  <  cn  —  2d  +  1,  and  it  was  easy  to  find  values  of  d  to  make  cn  —  Id  +  1  be 
less  than  or  equal  to  cn  —  d. 

Avoiding  pitfalls 

It  is  easy  to  err  in  the  use  of  asymptotic  notation.  For  example,  in  the  recur¬ 
rence  (4.19)  we  can  falsely  “prove”  T(n)  =  0(n)  by  guessing  T(n )  <  cn  and 
then  arguing 

T(n)  <  2{c\_n/2\)  +  n 
<  cn  +  n 

—  O(n)  ,  <=  wrong!! 

since  c  is  a  constant.  The  error  is  that  we  have  not  proved  the  exact  form  of  the 
inductive  hypothesis,  that  is,  that  T(n)  <  cn.  We  therefore  will  explicitly  prove 
that  T(n )  <  cn  when  we  want  to  show  that  T (n)  —  O(n). 

Changing  variables 

Sometimes,  a  little  algebraic  manipulation  can  make  an  unknown  recurrence  simi¬ 
lar  to  one  you  have  seen  before.  As  an  example,  consider  the  recurrence 

T(n)  =  2 T  (|_x/nj)  +  lg n  , 

which  looks  difficult.  We  can  simplify  this  recurrence,  though,  with  a  change  of 
variables.  For  convenience,  we  shall  not  worry  about  rounding  off  values,  such 
as  y/n,  to  be  integers.  Renaming  m  =  lg  n  yields 

T( 2m)  =  2T(2m/1)  +  m  . 

We  can  now  rename  S(rn)  =  T (2m)  to  produce  the  new  recurrence 
S(m)  =  2S(m/2)  +  m  , 


4.3  The  substitution  method  for  solving  recurrences 


87 


which  is  very  much  like  recurrence  (4.19).  Indeed,  this  new  recurrence  has  the 
same  solution:  S(m)  =  0(m  lg  m).  Changing  back  from  S(m)  to  T (??),  we  obtain 

T(n)  =  T(2m)  =  S(m)  =  O(mlgm)  —  0(\gn  \g\gn)  . 


Exercises 


4.3- 1 

Show  that  the  solution  of  T(n)  =  T (/?  —  1)  +  n  is  0(n2). 

4.3- 2 

Show  that  the  solution  of  T(n)  —  T(\n/2])  +  1  is  O (lg  n). 

4.3- 3 

We  saw  that  the  solution  of  T(n )  =  2T(  [n / 2J )  +  ??  is  O ( n  lg  n).  Show  that  the  so¬ 
lution  of  this  recurrence  is  also  Q(n  Ign).  Conclude  that  the  solution  is  0(«  lg  /?). 

4.3- 4 

Show  that  by  making  a  different  inductive  hypothesis,  we  can  overcome  the  diffi¬ 
culty  with  the  boundary  condition  7(1)  =  1  for  recurrence  (4. 19)  without  adjusting 
the  boundary  conditions  for  the  inductive  proof. 


4.3- 5 

Show  that  0(»  lg/?)  is  the  solution  to  the  “exact”  recurrence  (4.3)  for  merge  sort. 

4.3- 6 

Show  that  the  solution  to  T(n)  =  IT ( [_/? / 2J  +  17)  +  n  is  0(n  lg/?). 


4.3-7 

Using  the  master  method  in  Section  4.5,  you  can  show  that  the  solution  to  the 
recurrence  T (/?)  =  AT (/? /3)  +  /?  is  T (??)  =  0(??log34).  Show  that  a  substitution 
proof  with  the  assumption  T(n)  <  c??log34  fails.  Then  show  how  to  subtract  off  a 
lower-order  term  to  make  a  substitution  proof  work. 


4.3-8 

Using  the  master  method  in  Section  4.5,  you  can  show  that  the  solution  to  the 
recurrence  T (/?)  =  AT (n/2)  +  ??2  is  T (/?)  =  0(??2).  Show  that  a  substitution 
proof  with  the  assumption  T(/?)  <  c??2  fails.  Then  show  how  to  subtract  off  a 
lower-order  term  to  make  a  substitution  proof  work. 
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4.3-9 

Solve  the  recurrence  T(n)  =  3T(^/n)  +  log  n  by  making  a  change  of  variables. 
Your  solution  should  be  asymptotically  tight.  Do  not  worry  about  whether  values 
are  integral. 


4.4  The  recursion-tree  method  for  solving  recurrences 

Although  you  can  use  the  substitution  method  to  provide  a  succinct  proof  that 
a  solution  to  a  recurrence  is  correct,  you  might  have  trouble  coming  up  with  a 
good  guess.  Drawing  out  a  recursion  tree,  as  we  did  in  our  analysis  of  the  merge 
sort  recurrence  in  Section  2.3.2,  serves  as  a  straightforward  way  to  devise  a  good 
guess.  In  a  recursion  tree ,  each  node  represents  the  cost  of  a  single  subproblem 
somewhere  in  the  set  of  recursive  function  invocations.  We  sum  the  costs  within 
each  level  of  the  tree  to  obtain  a  set  of  per-level  costs,  and  then  we  sum  all  the 
per-level  costs  to  determine  the  total  cost  of  all  levels  of  the  recursion. 

A  recursion  tree  is  best  used  to  generate  a  good  guess,  which  you  can  then  verify 
by  the  substitution  method.  When  using  a  recursion  tree  to  generate  a  good  guess, 
you  can  often  tolerate  a  small  amount  of  “sloppiness,”  since  you  will  be  verifying 
your  guess  later  on.  If  you  are  very  careful  when  drawing  out  a  recursion  tree  and 
summing  the  costs,  however,  you  can  use  a  recursion  tree  as  a  direct  proof  of  a 
solution  to  a  recurrence.  In  this  section,  we  will  use  recursion  trees  to  generate 
good  guesses,  and  in  Section  4.6,  we  will  use  recursion  trees  directly  to  prove  the 
theorem  that  forms  the  basis  of  the  master  method. 

For  example,  let  us  see  how  a  recursion  tree  would  provide  a  good  guess  for 
the  recurrence  T(n)  =  37( \_n / 4J )  +  @(n2).  We  start  by  focusing  on  finding  an 
upper  bound  for  the  solution.  Because  we  know  that  floors  and  ceilings  usually  do 
not  matter  when  solving  recurrences  (here’s  an  example  of  sloppiness  that  we  can 
tolerate),  we  create  a  recursion  tree  for  the  recurrence  T(n)  =  37 (n/4)  +  c/72, 
having  written  out  the  implied  constant  coefficient  c  >  0. 

Figure  4.5  shows  how  we  derive  the  recursion  tree  for  T(n)  =  37(/?/4)  +  c/72. 
For  convenience,  we  assume  that  n  is  an  exact  power  of  4  (another  example  of 
tolerable  sloppiness)  so  that  all  subproblem  sizes  are  integers.  Part  (a)  of  the  figure 
shows  7(/t),  which  we  expand  in  part  (b)  into  an  equivalent  tree  representing  the 
recurrence.  The  cn2  term  at  the  root  represents  the  cost  at  the  top  level  of  recursion, 
and  the  three  subtrees  of  the  root  represent  the  costs  incurred  by  the  subproblems 
of  size  n  / 4.  Part  (c)  shows  this  process  carried  one  step  further  by  expanding  each 
node  with  cost  7 (n / 4)  from  part  (b).  The  cost  for  each  of  the  three  children  of  the 
root  is  c(n/4)2.  We  continue  expanding  each  node  in  the  tree  by  breaking  it  into 
its  constituent  parts  as  determined  by  the  recurrence. 
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cn2  . . . in-  cn2 


▼  7(  1)  7(1)  7(  1)  7(  1)  7(1)  7(  1)  7(  1)  7(1)  7(1)  7(1)  •••  7(1)  7(1)  7(1)  in-  ©(/i1^3) 


(d) 


Total:  0(n2) 


Figure  4.5  Constructing  a  recursion  tree  for  the  recurrence  T(n)  =  37(n/4)  +  cn2.  Part  (a) 
shows  7 («),  which  progressively  expands  in  (b)  (d)  to  form  the  recursion  tree.  The  fully  expanded 
tree  in  part  (d)  has  height  log4  n  (it  has  log4  n  +  1  levels). 
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Because  subproblem  sizes  decrease  by  a  factor  of  4  each  time  we  go  down  one 
level,  we  eventually  must  reach  a  boundary  condition.  How  far  from  the  root  do 
we  reach  one?  The  subproblem  size  for  a  node  at  depth  i  is  n/ 4'.  Thus,  the 
subproblem  size  hits  n  =  1  when  n  /  4'  =  1  or,  equivalently,  when  i  =  log4/7. 
Thus,  the  tree  has  log4  77  +  1  levels  (at  depths  0,  1, 2, ... ,  log4  n). 

Next  we  determine  the  cost  at  each  level  of  the  tree.  Each  level  has  three  times 
more  nodes  than  the  level  above,  and  so  the  number  of  nodes  at  depth  i  is  3'. 
Because  subproblem  sizes  reduce  by  a  factor  of  4  for  each  level  we  go  down 
from  the  root,  each  node  at  depth  i,  for  i  =  0, 1, 2, ... ,  log4  77  —  1,  has  a  cost 
of  c  (77 / 4'  )2.  Multiplying,  we  see  that  the  total  cost  over  all  nodes  at  depth  i,  for 
i  =  0, 1, 2, . . . ,  log4  77  —  1,  is  3ic(7?/4')2  =  (3/16)!C772.  The  bottom  level,  at 
depth  log4 77,  has  3log4"  =  77log43  nodes,  each  contributing  cost  T(  1),  for  a  total 
cost  of  77Iog4  3 T(  1),  which  is  0(77log43),  since  we  assume  that  7(1)  is  a  constant. 
Now  we  add  up  the  costs  over  all  levels  to  determine  the  cost  for  the  entire  tree: 

T  /  T  \  2  /  T  \  log4n-l 

T  (77 )  =  C77  2  +  — C772  +  (— )  C772  +  -.-+  (— J  C772  +  0(77log4  3) 

log4  n-1  .  i 

=  X)  (l6  )  ™2  +  0(77IOg43) 

1=0  '  ' 

(3/16Vog4'2  —  1 

=  - r-  cn2  +  0(77log4  3)  (by  equation  (A.5))  . 

(3/16)  -  1 

This  last  formula  looks  somewhat  messy  until  we  realize  that  we  can  again  take 
advantage  of  small  amounts  of  sloppiness  and  use  an  infinite  decreasing  geometric 
series  as  an  upper  bound.  Backing  up  one  step  and  applying  equation  (A.6),  we 
have 

T(n)  = 

< 


Thus,  we  have  derived  a  guess  of  T (77 )  =  0(n2)  for  our  original  recurrence 
T (77)  =  3 7" ( |_77 /4J )  +  © (77 2 ) .  In  this  example,  the  coefficients  of  cn2  form  a 
decreasing  geometric  series  and,  by  equation  (A.6),  the  sum  of  these  coefficients 


log4  77  —  1  .  .  i 

(  —  )  C772  +  0(77log43) 
/=0  ^  ^ 

C712  +  0("1OE43) 

- ]— — cn2  +  0(77log43) 

1  -  (3/16)  v 

cn2  +  0(77log43) 

0(772)  . 
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log3/2  n 


cn 


c 


(§) 


c(f) 


■I" 


cn 


cn 


cn 


:  : 

Total:  0(n\gn) 

Figure  4.6  A  recursion  tree  for  the  recurrence  T ( n )  =  T(n /3)  +  T (2n/3)  +  cn. 


is  bounded  from  above  by  the  constant  16/13.  Since  the  root’s  contribution  to  the 
total  cost  is  cn2,  the  root  contributes  a  constant  fraction  of  the  total  cost.  In  other 
words,  the  cost  of  the  root  dominates  the  total  cost  of  the  tree. 

In  fact,  if  0(n2 )  is  indeed  an  upper  bound  for  the  recurrence  (as  we  shall  verify  in 
a  moment),  then  it  must  be  a  tight  bound.  Why?  The  first  recursive  call  contributes 
a  cost  of  0(/t2),  and  so  Q.(n2)  must  be  a  lower  bound  for  the  recurrence. 

Now  we  can  use  the  substitution  method  to  verify  that  our  guess  was  cor¬ 
rect,  that  is,  T(n)  =  0(n2)  is  an  upper  bound  for  the  recurrence  T (n)  = 
37’(L«/4J)  +  @(«2).  We  want  to  show  that  T(n)  <  dn2  for  some  constant  d  >  0. 
Using  the  same  constant  c  >  0  as  before,  we  have 


Tin)  < 

3Ti[n/A\)+cn2 

< 

3d  L«/4J2  +  cn2 

< 

3din/\)2  +  cn2 

—  dn2  +  cn2 

16 

< 

dn2  , 

where  the  last  step  holds  as  long  as  d  >  (16/13)c. 

In  another,  more  intricate,  example,  Figure  4.6  shows  the  recursion  tree  for 

T{n)  =  T[n/ 3)  +  T(2n/3)  +  Oin)  . 

(Again,  we  omit  floor  and  ceiling  functions  for  simplicity.)  As  before,  we  let  c 
represent  the  constant  factor  in  the  Oin)  term.  When  we  add  the  values  across  the 
levels  of  the  recursion  tree  shown  in  the  figure,  we  get  a  value  of  cn  for  every  level. 
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The  longest  simple  path  from  the  root  to  a  leaf  is  n  — (2/3 )n  — (2/3 )2«  —>■ 
•  •  •  — >•  1.  Since  {2/3)kn  =  1  when  k  =  log3(/2  «,  the  height  of  the  tree  is  log3/2  n. 

Intuitively,  we  expect  the  solution  to  the  recurrence  to  be  at  most  the  number 
of  levels  times  the  cost  of  each  level,  or  0(cn  log3/2  n)  =  0(n  lg  n).  Figure  4.6 
shows  only  the  top  levels  of  the  recursion  tree,  however,  and  not  every  level  in  the 
tree  contributes  a  cost  of  cn.  Consider  the  cost  of  the  leaves.  If  this  recursion  tree 
were  a  complete  binary  tree  of  height  log3/2  n,  there  would  be  2log3/2"  =  nlog3/2 2 
leaves.  Since  the  cost  of  each  leaf  is  a  constant,  the  total  cost  of  all  leaves  would 
then  be  B(nlog3/2  2)  which,  since  log3/2  2  is  a  constant  strictly  greater  than  1, 
is  co(nlgn).  This  recursion  tree  is  not  a  complete  binary  tree,  however,  and  so 
it  has  fewer  than  /jlog3/2  2  leaves.  Moreover,  as  we  go  down  from  the  root,  more 
and  more  internal  nodes  are  absent.  Consequently,  levels  toward  the  bottom  of  the 
recursion  tree  contribute  less  than  cn  to  the  total  cost.  We  could  work  out  an  accu¬ 
rate  accounting  of  all  costs,  but  remember  that  we  are  just  trying  to  come  up  with  a 
guess  to  use  in  the  substitution  method.  Let  us  tolerate  the  sloppiness  and  attempt 
to  show  that  a  guess  of  0(n  lg  n)  for  the  upper  bound  is  correct. 

Indeed,  we  can  use  the  substitution  method  to  verify  that  0{n  lg n)  is  an  upper 
bound  for  the  solution  to  the  recurrence.  We  show  that  T(n)  <  dn  lgn,  where  d  is 
a  suitable  positive  constant.  We  have 


T(n)  <  T  {n  /  3)  +  T(2n/3)  +  cn 

<  d(n/3)  lg(«/3)  +  d(2n/3)  lg(2«/3)  +  cn 
=  (d(n/3)lgn  —  d(n/3)lg3) 

+  (d(2n/3)  \gn  —  d{2n/3>)\g(3/2))  +  cn 
=  dn  lg  n  —  d((n/ 3)  lg  3  +  (2n/3)  lg(3/2))  +  cn 
=  dnlgn  —  d((n/ 3)  lg  3  +  (2«/3)lg3  —  (2n/3)lg2)  +  cn 
—  dn\gn  —  d n  (lg  3  —  2/3)  T  cn 

<  dn  lg  n  , 


as  long  as  d  >  c/ (lg  3  —  (2/3)).  Thus,  we  did  not  need  to  perform  a  more  accurate 
accounting  of  costs  in  the  recursion  tree. 


Exercises 


4.4-1 

Use  a  recursion  tree  to  determine  a  good  asymptotic  upper  bound  on  the  recurrence 
T(n)  —  3T(\n/2\)  +  n.  Use  the  substitution  method  to  verify  your  answer. 


4.4-2 

Use  a  recursion  tree  to  determine  a  good  asymptotic  upper  bound  on  the  recurrence 
T(n)  =  T (n/2)  +  n2 .  Use  the  substitution  method  to  verify  your  answer. 


4.5  The  master  method  for  solving  recurrences 


93 


4.4- 3 

Use  a  recursion  tree  to  determine  a  good  asymptotic  upper  bound  on  the  recurrence 
T(n)  =  47 (n/2  +  2)  +  n.  Use  the  substitution  method  to  verify  your  answer. 

4.4- 4 

Use  a  recursion  tree  to  determine  a  good  asymptotic  upper  bound  on  the  recurrence 
T(n )  =  2T(n  —  1)  +  1.  Use  the  substitution  method  to  verify  your  answer. 


4.4- 5 

Use  a  recursion  tree  to  determine  a  good  asymptotic  upper  bound  on  the  recurrence 
7 (n)  =  7 (n  —  1)  +  7 (n /2 )  +  n.  Use  the  substitution  method  to  verify  your  answer. 

4.4- 6 

Argue  that  the  solution  to  the  recurrence  T(n )  =  7 (n /3)  +  7 (2 n /3 )  +  cn,  where  c 
is  a  constant,  is  £2(n  lg  n)  by  appealing  to  a  recursion  tree. 


4.4- 7 

Draw  the  recursion  tree  for  T(n)  =  47(|_«/2J)  +  cn,  where  c  is  a  constant,  and 
provide  a  tight  asymptotic  bound  on  its  solution.  Verify  your  bound  by  the  substi¬ 
tution  method. 

4.4- 8 

Use  a  recursion  tree  to  give  an  asymptotically  tight  solution  to  the  recurrence 
T(n)  =  T(n  —  a)  +  T(a)  +  cn,  where  a  >  1  and  c  >  0  are  constants. 

4.4- 9 

Use  a  recursion  tree  to  give  an  asymptotically  tight  solution  to  the  recurrence 
7 (n)  =  7 (an)  +  7((1  —  a)n)  +  cn,  where  a  is  a  constant  in  the  range  0  <  a  <  1 
and  c  >  0  is  also  a  constant. 


4.5  The  master  method  for  solving  recurrences 

The  master  method  provides  a  “cookbook”  method  for  solving  recurrences  of  the 
form 

T(n)  =  aT(n/b)  +  f(n)  ,  (4.20) 

where  a  >  1  and  b  >  1  are  constants  and  fin)  is  an  asymptotically  positive 
function.  To  use  the  master  method,  you  will  need  to  memorize  three  cases,  but 
then  you  will  be  able  to  solve  many  recurrences  quite  easily,  often  without  pencil 
and  paper. 
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The  recurrence  (4.20)  describes  the  running  time  of  an  algorithm  that  divides  a 
problem  of  size  n  into  a  subproblems,  each  of  size  n/b,  where  a  and  b  are  positive 
constants.  The  a  subproblems  are  solved  recursively,  each  in  time  T(n/b).  The 
function  / (n)  encompasses  the  cost  of  dividing  the  problem  and  combining  the 
results  of  the  subproblems.  For  example,  the  recurrence  arising  from  Strassen’s 
algorithm  has  a  =  7,  b  =  2,  and  f(n)  =  0(»2). 

As  a  matter  of  technical  correctness,  the  recurrence  is  not  actually  well  defined, 
because  n/b  might  not  be  an  integer.  Replacing  each  of  the  a  terms  T(n/b)  with 
either  T ( \n/b\ )  or  T ([n/b] )  will  not  affect  the  asymptotic  behavior  of  the  recur¬ 
rence,  however.  (We  will  prove  this  assertion  in  the  next  section.)  We  normally 
find  it  convenient,  therefore,  to  omit  the  floor  and  ceiling  functions  when  writing 
divide-and-conquer  recurrences  of  this  form. 

The  master  theorem 

The  master  method  depends  on  the  following  theorem. 

Theorem  4.1  ( Master  theorem ) 

Let  a  >  1  and  b  >  1  be  constants,  let  f(n)  be  a  function,  and  let  T(n)  be  defined 
on  the  nonnegative  integers  by  the  recurrence 

T(n)  =  aT  (n/b)  +  f{n), 

where  we  interpret  n/b  to  mean  either  [n/b\  or  [n/b] .  Then  T (/? )  has  the  follow¬ 
ing  asymptotic  bounds: 

1.  If  f(n)  =  0(nloSba~€)  for  some  constant  e  >  0,  then  T(n)  =  0(/2logia). 

2.  If  f(n)  =  0(nlog*a),  then  T(n)  =  0(/zlog*a  lg/r). 

3.  If  f(n)  =  Q(n]ugha+e)  for  some  constant  e  >  0,  and  if  af  (n/b)  <  cf(n )  for 

some  constant  c  <  1  and  all  sufficiently  large  n,  then  T (n)  =  ©(/(«)).  ■ 


Before  applying  the  master  theorem  to  some  examples,  let’s  spend  a  moment 
trying  to  understand  what  it  says.  In  each  of  the  three  cases,  we  compare  the 
function  f(n)  with  the  function  nu>gh“ .  Intuitively,  the  larger  of  the  two  functions 
determines  the  solution  to  the  recurrence.  If,  as  in  case  1,  the  function  /jIog*“  is  the 
larger,  then  the  solution  is  T(n)  =  0(/?log*a).  If,  as  in  case  3,  the  function  /(/?) 
is  the  larger,  then  the  solution  is  T(n)  =  ©(/(«)).  If,  as  in  case  2,  the  two  func¬ 
tions  are  the  same  size,  we  multiply  by  a  logarithmic  factor,  and  the  solution  is 
T(n)  =  0(«log6alg»)  =  ©(/(«)  lg/j). 

Beyond  this  intuition,  you  need  to  be  aware  of  some  technicalities.  In  the  first 
case,  not  only  must  f(n)  be  smaller  than  /zlog*a,  it  must  be  polynomially  smaller. 
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That  is,  f(n)  must  be  asymptotically  smaller  than  nloSb  "by  a  factor  of  n€  for  some 
constant  e  >  0.  In  the  third  case,  not  only  must  f(n)  be  larger  than  n logb " ,  it  also 
must  be  polynomially  larger  and  in  addition  satisfy  the  “regularity”  condition  that 
afiri/b)  <  cf  in).  This  condition  is  satisfied  by  most  of  the  polynomially  bounded 
functions  that  we  shall  encounter. 

Note  that  the  three  cases  do  not  cover  all  the  possibilities  for  fin).  There  is 
a  gap  between  cases  1  and  2  when  fin)  is  smaller  than  n log*  “  but  not  polynomi¬ 
ally  smaller.  Similarly,  there  is  a  gap  between  cases  2  and  3  when  / in )  is  larger 
than  nXogb  "  but  not  polynomially  larger.  If  the  function  fin)  falls  into  one  of  these 
gaps,  or  if  the  regularity  condition  in  case  3  fails  to  hold,  you  cannot  use  the  master 
method  to  solve  the  recurrence. 

Using  the  master  method 

To  use  the  master  method,  we  simply  determine  which  case  (if  any)  of  the  master 
theorem  applies  and  write  down  the  answer. 

As  a  first  example,  consider 

T(n)  =  9T(n/3)  +  n  . 

For  this  recurrence,  we  have  a  =  9,  b  =  3,  fin)  =  n,  and  thus  we  have  that 
nl°  g*"  =  /zlog3  9  =  0(«2).  Since  fin)  =  0(/jlog3  9~e),  where  e  =  1,  we  can  apply 
case  1  of  the  master  theorem  and  conclude  that  the  solution  is  T(n)  =  0(/i2). 
Now  consider 

T{n)  =  T(2n/3)  +  1, 

in  which  a  =  1,  b  =  3/2,  f(n)  =  1,  and  nloSba  =  «log 3/zl  =  n°  =  1.  Case  2 
applies,  since  fin )  =  B(HIog/’ “)  =  0(1),  and  thus  the  solution  to  the  recurrence 
is  Tin)  =  0(lg 72 ). 

For  the  recurrence 

Tin)  =  3T(«/4)  +  n  lg«  , 

we  have  a  =  3,  b  =  4,  fin)  =  nlgn,  and  nl°gb“  =  nlog43  =  Oin0193). 
Since  fin)  =  f2(nlog43+e),  where  e  0.2,  case  3  applies  if  we  can  show  that 
the  regularity  condition  holds  for  fin).  For  sufficiently  large  n,  we  have  that 
a  fin /b)  =  3(«/4)  lg(«/4)  <  (3/4)«  lg  n  =  cfin)  for  c  =  3/4.  Consequently, 
by  case  3,  the  solution  to  the  recurrence  is  T)n)  =  0(/;  lg/;)- 
The  master  method  does  not  apply  to  the  recurrence 

Tin )  =  2T(n/2)  +  n  lg  n  , 

even  though  it  appeal's  to  have  the  proper  form:  a  =  2,  b  =  2,  /(/;)  =  n  lg  n, 
and  nlogb“  =  n.  You  might  mistakenly  think  that  case  3  should  apply,  since 
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f(n )  =  n  lg  n  is  asymptotically  larger  than  nlogba  =  n.  The  problem  is  that  it 
is  not  polynomially  larger.  The  ratio  /(//)/ «Iog*“  =  (nlgn)/n  =  lg  n  is  asymp¬ 
totically  less  than  ne  for  any  positive  constant  e.  Consequently,  the  recurrence  falls 
into  the  gap  between  case  2  and  case  3.  (See  Exercise  4.6-2  for  a  solution.) 

Let’s  use  the  master  method  to  solve  the  recurrences  we  saw  in  Sections  4.1 
and  4.2.  Recurrence  (4.7), 

T(n)  =  2T(n/2)  +  ©(«)  , 

characterizes  the  running  times  of  the  divide-and-conquer  algorithm  for  both  the 
maximum-subarray  problem  and  merge  sort.  (As  is  our  practice,  we  omit  stating 
the  base  case  in  the  recurrence.)  Here,  we  have  a  =  2,  b  —  2,  fin)  =  ©(«),  and 
thus  we  have  that  n,ogba  =  n log2 2  =  n.  Case  2  applies,  since  fin)  =  ©(«),  and  so 
we  have  the  solution  T(n)  =  ©(«  lgn). 

Recurrence  (4.17), 

T{n)  =  87>/2)  +  @(n2)  , 

describes  the  running  time  of  the  first  divide-and-conquer  algorithm  that  we  saw 
for  matrix  multiplication.  Now  we  have  a  =  8,  b  =  2,  and  f(n)  =  0(n2), 
and  so  n k,gb “  =  n log2 8  =  n3.  Since  n3  is  polynomially  larger  than  fin)  (that  is, 
f(n )  =  0{n3~e)  fore  =  1),  case  1  applies,  and  T(n)  =  0(n3). 

Finally,  consider  recurrence  (4.18), 

T[n )  =  TT{n/2)  +  @(n2)  , 

which  describes  the  running  time  of  Strassen’s  algorithm.  Here,  we  have  a  =  7 , 
b  =  2,  fin)  =  0(/;2),  and  thus  n logh  a  =  /jIog2 1 .  Rewriting  log2  7  as  lg  7  and 
recalling  that  2.80  <  lg  7  <  2.81,  we  see  that  fin)  =  0(«lg7_e)  for  e  =  0.8. 
Again,  case  1  applies,  and  we  have  the  solution  Tin)  =  0(«lg7). 

Exercises 


4.5-1 

Use  the  master  method  to  give  tight  asymptotic  bounds  for  the  following  recur¬ 
rences. 

a.  Tfi)  =  2Tinf\)  +  1. 

b.  Tin)  =  2Tin/4)  +  Jn. 

c.  Tin)  =  2Tin/4)  +  n. 

d.  Tft)  =  2T(n/4)  +  n2. 
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4.5-2 

Professor  Caesar  wishes  to  develop  a  matrix-multiplication  algorithm  that  is 
asymptotically  faster  than  Strassen’s  algorithm.  His  algorithm  will  use  the  divide - 
and-conquer  method,  dividing  each  matrix  into  pieces  of  size  n/4  x  n/4,  and  the 
divide  and  combine  steps  together  will  take  0(«2)  time.  He  needs  to  determine 
how  many  subproblems  his  algorithm  has  to  create  in  order  to  beat  Strassen’s  algo¬ 
rithm.  If  his  algorithm  creates  a  subproblems,  then  the  recurrence  for  the  running 
time  T(n )  becomes  T (n)  =  aT(n/4)  +  0(n2).  What  is  the  largest  integer  value 
of  a  for  which  Professor  Caesar’s  algorithm  would  be  asymptotically  faster  than 
Strassen’s  algorithm? 


4.5- 3 

Use  the  master  method  to  show  that  the  solution  to  the  binary-search  recurrence 
T{n)  —  T(n/2)  +  0(1)  is  T(n)  =  0(lg«).  (See  Exercise  2.3-5  for  a  description 
of  binary  search.) 

4.5- 4 

Can  the  master  method  be  applied  to  the  recurrence  T(n )  =  4T(n/2)  +  «2  lgn? 
Why  or  why  not?  Give  an  asymptotic  upper  bound  for  this  recurrence. 

4.5- 5  * 

Consider  the  regularity  condition  af(n/b)  <  cf(n)  for  some  constant  c  <  1, 
which  is  part  of  case  3  of  the  master  theorem.  Give  an  example  of  constants  a  >  I 
and  b  >  1  and  a  function  / (n )  that  satisfies  all  the  conditions  in  case  3  of  the 
master  theorem  except  the  regularity  condition. 


★  4.6  Proof  of  the  master  theorem 

This  section  contains  a  proof  of  the  master  theorem  (Theorem  4.1).  You  do  not 
need  to  understand  the  proof  in  order  to  apply  the  master  theorem. 

The  proof  appears  in  two  parts.  The  first  part  analyzes  the  master  recur¬ 
rence  (4.20),  under  the  simplifying  assumption  that  T{n)  is  defined  only  on  ex¬ 
act  powers  of  b  >  1,  that  is,  for  n  =  I ,  b.  b2, . . ..  This  part  gives  all  the  intuition 
needed  to  understand  why  the  master  theorem  is  true.  The  second  part  shows  how 
to  extend  the  analysis  to  all  positive  integers  n ;  it  applies  mathematical  technique 
to  the  problem  of  handling  floors  and  ceilings. 

In  this  section,  we  shall  sometimes  abuse  our  asymptotic  notation  slightly  by 
using  it  to  describe  the  behavior  of  functions  that  are  defined  only  over  exact 
powers  of  b.  Recall  that  the  definitions  of  asymptotic  notations  require  that 
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bounds  be  proved  for  all  sufficiently  large  numbers,  not  just  those  that  are  pow¬ 
ers  of  b.  Since  we  could  make  new  asymptotic  notations  that  apply  only  to  the  set 
{b‘  :  i  =  0,  1, 2, . . .},  instead  of  to  the  nonnegative  numbers,  this  abuse  is  minor. 

Nevertheless,  we  must  always  be  on  guard  when  we  use  asymptotic  notation  over 
a  limited  domain  lest  we  draw  improper  conclusions.  For  example,  proving  that 
Tin)  =  O(n)  when  n  is  an  exact  power  of  2  does  not  guarantee  that  T (n)  =  0(n). 
The  function  T  (n )  could  be  defined  as 


n  if  n  =  1, 2,  4,  8, . . .  , 
n2  otherwise  , 


Tin) 


in  which  case  the  best  upper  bound  that  applies  to  all  values  of  n  is  T(n)  =  0{n2). 
Because  of  this  sort  of  drastic  consequence,  we  shall  never  use  asymptotic  notation 
over  a  limited  domain  without  making  it  absolutely  clear  from  the  context  that  we 
are  doing  so. 

4.6.1  The  proof  for  exact  powers 


The  first  part  of  the  proof  of  the  master  theorem  analyzes  the  recurrence  (4.20) 
T{n)  =  aT {n /b)  +  fin)  , 


for  the  master  method,  under  the  assumption  that  n  is  an  exact  power  of  b  >  1, 
where  b  need  not  be  an  integer.  We  break  the  analysis  into  three  lemmas.  The  first 
reduces  the  problem  of  solving  the  master  recurrence  to  the  problem  of  evaluating 
an  expression  that  contains  a  summation.  The  second  determines  bounds  on  this 
summation.  The  third  lemma  puts  the  first  two  together  to  prove  a  version  of  the 
master  theorem  for  the  case  in  which  n  is  an  exact  power  of  b. 

Lemma  4.2 

Let  a  >  1  and  b  >  1  be  constants,  and  let  fin)  be  a  nonnegative  function  defined 
on  exact  powers  of  b.  Define  T(n)  on  exact  powers  of  b  by  the  recurrence 


Tin) 


0(1) 


if  n  =  1  , 


aTin/b)  +  fin)  if n  —  b'  , 


where  i  is  a  positive  integer.  Then 


log*  n- 1 


(4.21) 


Proof  We  use  the  recursion  tree  in  Figure  4.7.  The  root  of  the  tree  has  cost  / (73), 
and  it  has  a  children,  each  with  cost  / (77/ b).  (It  is  convenient  to  think  of  a  as  being 


4.6  Proof  of  the  master  theorem 


99 


/(«)■ 


/(«) 


/(»/*)  /(»/*)  •••  /(»/*)• . "in-  af(n/b) 

...  /ik  /ik  /ik 

f  (n  /  b2)  f  (n  /  b2)-f  (n  /  b2)  f(n/b2)f(n/b2}  -f(n/b2)  f(n/b2)f(n/b2)-f(n/b2) . .  a2/W*2) 

Hk  Jlk  Ilk  Hk  ilk  Ilk  IlkUkllk 


T  0(1)  0(1)  0(1)  0(1)  0(1)  0(1)  0(1)  0(1)  0(1)  0(1)  ...  0(1)  0(1)  0(1)  ...ill-  0(/ik>*'-a) 


nHb“ 

10g(,B-l 

Total:  0(nlo‘*a)+  ^  a>  f(n/bJ) 
J= o 


Figure  4.7  The  recursion  tree  generated  by  T(n)  =  aT(n/b)  +  / (n).  The  tree  is  a  complete  a  ary 
tree  with  «log*  a  leaves  and  height  log*  n.  The  cost  of  the  nodes  at  each  depth  is  shown  at  the  right, 
and  their  sum  is  given  in  equation  (4.21). 

an  integer,  especially  when  visualizing  the  recursion  tree,  but  the  mathematics  does 
not  require  it.)  Each  of  these  children  has  a  children,  making  a 2  nodes  at  depth  2, 
and  each  of  the  a  children  has  cost  f(n/b2).  In  general,  there  are  aJ  nodes  at 
depth  j,  and  each  has  cost  f(n/bJ  ).  The  cost  of  each  leaf  is  T(\)  =  0(1),  and 
each  leaf  is  at  depth  log/,  n,  since  n/b'°ebn  =  1.  There  are  aloeh "  =  niogba  leaves 
in  the  tree. 

We  can  obtain  equation  (4.21)  by  summing  the  costs  of  the  nodes  at  each  depth 
in  the  tree,  as  shown  in  the  figure.  The  cost  for  all  internal  nodes  at  depth  j  is 
aJ  f(n/bJ),  and  so  the  total  cost  of  all  internal  nodes  is 

log/,  n- 1 

£  aJf{»/bJ). 
j= o 

In  the  underlying  divide-and-conquer  algorithm,  this  sum  represents  the  costs  of 
dividing  problems  into  subproblems  and  then  recombining  the  subproblems.  The 
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cost  of  all  the  leaves,  which  is  the  cost  of  doing  all  nl°Sba  subproblems  of  size  1, 
is  0(/rlog*a).  ■ 

In  terms  of  the  recursion  tree,  the  three  cases  of  the  master  theorem  correspond 
to  cases  in  which  the  total  cost  of  the  tree  is  (1)  dominated  by  the  costs  in  the 
leaves,  (2)  evenly  distributed  among  the  levels  of  the  tree,  or  (3)  dominated  by  the 
cost  of  the  root. 

The  summation  in  equation  (4.21)  describes  the  cost  of  the  dividing  and  com¬ 
bining  steps  in  the  underlying  divide-and-conquer  algorithm.  The  next  lemma  pro¬ 
vides  asymptotic  bounds  on  the  summation’s  growth. 

Lemma  4.3 

Let  a  >  1  and  b  >  1  be  constants,  and  let  / (n)  be  a  nonnegative  function  defined 
on  exact  powers  of  b.  A  function  g(n)  defined  over  exact  powers  of  b  by 


log*  n—1 


(4.22) 


;=o 

has  the  following  asymptotic  bounds  for  exact  powers  of  b : 

1.  If  f(n)  =  0(nl°Sl,a~e )  for  some  constant  e  >  0,  then  g(n)  =  0(nloSba). 

2.  If  f{n)  =  0(nlog*a),  then  g(n)  =  0(«logia  lgn). 

3.  If  af(n/b)  <  cf(n)  for  some  constant  c  <  1  and  for  all  sufficiently  large  n, 
then  g(n)  =  0(/(n)). 

Proof  For  case  1,  we  have  f(n)  =  0(nlogba  e),  which  implies  that  f(n/bJ )  = 
0((n /bJ)Iogba~e).  Substituting  into  equation  (4.22)  yields 


(4.23) 


We  bound  the  summation  within  the  O -notation  by  factoring  out  terms  and  simpli¬ 
fying,  which  leaves  an  increasing  geometric  series: 


log t  n-1 
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Since  b  and  e  are  constants,  we  can  rewrite  the  last  expression  as  nl°Sb  a  €  0(n€)  = 
0(nkmh  a).  Substituting  this  expression  for  the  summation  in  equation  (4.23)  yields 

g(n)  =  0(nx°Sba )  , 


thereby  proving  case  1 . 

Because  case  2  assumes  that  f(n)  =  0(«log&a),  we  have  that  f(n/bj )  = 
©((n/bJy°Sba).  Substituting  into  equation  (4.22)  yields 


g(n)  =  0 


(4.24) 


We  bound  the  summation  within  the  0-notation  as  in  case  1 ,  but  this  time  we  do  not 
obtain  a  geometric  series.  Instead,  we  discover  that  every  term  of  the  summation 
is  the  same: 


log b  n—1 

J2  a 


1=0 


/  n  \l°Sba 

\bi) 


n 


log*  a 


n 


logbn-l 

y  (—) 

4-^  \  hl°Sb  a  ) 


j=o 

logb  n- 1 

log*  a  y  '  | 

1=0 

nl°Sbalogbn  . 


Substituting  this  expression  for  the  summation  in  equation  (4.24)  yields 

g(n)  =  0 ( n logfc “  log,,  n) 

=  &(nloSba\gn)  , 


proving  case  2. 

We  prove  case  3  similarly.  Since  f(n)  appears  in  the  definition  (4.22)  of  g(n) 
and  all  terms  of  g(n)  are  nonnegative,  we  can  conclude  that  g(n)  =  Q.(f(n))  for 
exact  powers  of  b.  We  assume  in  the  statement  of  the  lemma  that  af{n/b)  <  cf  (n) 
for  some  constant  c  <  1  and  all  sufficiently  large  n.  We  rewrite  this  assumption 
as  f(n/b)  <  (c /a)  f{n)  and  iterate  j  times,  yielding  f(n/b  j )  <  ( c/a)jf(n )  or, 
equivalently,  aj  f(n/bj)  <  cj /(«),  where  we  assume  that  the  values  we  iterate 
on  are  sufficiently  large.  Since  the  last,  and  smallest,  such  value  is  n/b  '~x ,  it  is 
enough  to  assume  that  n/bj~l  is  sufficiently  large. 

Substituting  into  equation  (4.22)  and  simplifying  yields  a  geometric  series,  but 
unlike  the  series  in  case  1,  this  one  has  decreasing  terms.  We  use  an  0(1)  term  to 
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capture  the  terms  that  are  not  covered  by  our  assumption  that  n  is  sufficiently  large: 

log  b  n—1 

gin)  =  aJf(n/bJ) 

7=0 

log  b  n—1 

<  J2  cjf(n)  +  0(l) 

7=0 

oo 

<  f{n)Y^c]  +  0{\) 

7=0 

=  OU\n)), 

since  c  is  a  constant.  Thus,  we  can  conclude  that  g(n )  =  ©(/ («))  for  exact  powers 
of  b.  With  case  3  proved,  the  proof  of  the  lemma  is  complete.  ■ 

We  can  now  prove  a  version  of  the  master  theorem  for  the  case  in  which  n  is  an 
exact  power  of  b. 


Lemma  4.4 

Let  a  >  1  and  b  >  1  be  constants,  and  let  / (n )  be  a  nonnegative  function  defined 
on  exact  powers  of  b.  Define  T(n)  on  exact  powers  of  b  by  the  recurrence 


Tin) 


0(1)  if  ii  =  1  , 

aTin/b)  +  fin)  if n  =  b'  , 


where  i  is  a  positive  integer.  Then  T  (n)  has  the  following  asymptotic  bounds  for 
exact  powers  of  b : 

1.  If  fin)  =  0(nu>'ih  a~()  for  some  constant  e  >  0,  then  T  (n)  =  0(/?log*  “). 

2.  If  fin)  =  0(«log*a),  then  T(n)  =  0(nlog*a  lgii). 

3.  If  fin)  =  Q(n]ug,’a+()  for  some  constant  e  >  0,  and  if  afin/b)  <  cfin)  for 
some  constant  c  <  1  and  all  sufficiently  large  n,  then  Tin)  =  0  if  in)). 


Proof  We  use  the  bounds  in  Lemma  4.3  to  evaluate  the  summation  (4.21)  from 
Lemma  4.2.  For  case  1,  we  have 

Tin)  =  ®(nlogba)  +  OinloSba) 

=  ®(fOSba)  , 
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and  for  case  2, 


Tin)  =  @(nloSba)  +  &(n'°Sbalgn) 


=  0(7Jlog*a  lg/j)  . 


For  case  3, 


Tin)  =  0(«log*a)  +  ©(/(«)) 


=  ©(/(«))  , 

because  fin)  =  £2(«log*a+e). 


4.6.2  Floors  and  ceilings 

To  complete  the  proof  of  the  master  theorem,  we  must  now  extend  our  analysis  to 
the  situation  in  which  floors  and  ceilings  appear  in  the  master  recurrence,  so  that 
the  recurrence  is  defined  for  all  integers,  not  for  just  exact  powers  of  b.  Obtaining 
a  lower  bound  on 


Tin)  =  aTi\n/b])  +  fin) 
and  an  upper  bound  on 
Tin)  =  aTi\n/b\)  +  fin) 


(4.25) 


(4.26) 


is  routine,  since  we  can  push  through  the  bound  \n/b]  >  n/b  in  the  first  case  to 
yield  the  desired  result,  and  we  can  push  through  the  bound  |  n  jh\  <  n/b  in  the 
second  case.  We  use  much  the  same  technique  to  lower-bound  the  recurrence  (4.26) 
as  to  upper-bound  the  recurrence  (4.25),  and  so  we  shall  present  only  this  latter 
bound. 

We  modify  the  recursion  tree  of  Figure  4.7  to  produce  the  recursion  tree  in  Fig¬ 
ure  4.8.  As  we  go  down  in  the  recursion  tree,  we  obtain  a  sequence  of  recursive 
invocations  on  the  arguments 


n  , 


\n/b)  , 

\\n/b)  /b 1  , 
U\n/b]  /b]  /b] 


Let  us  denote  the  j  th  element  in  the  sequence  by  /?7 ,  where 

in  if  j  =  0  , 

(  \nj-i/b]  if  j  >  0  . 


nj 


(4.27) 
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m 


"S»"  /(«) 


Liogfc  "J 


fin,) 

fP\ 


fin i)  •••  fin,)  . in-  afin,) 

f A  fh 


f ini)  fini)  -  fin 2)  fin 2)  /(/12)  -  /O2)  fin 2)  fin2)  -  fin2)  si'-  a2f(n2) 

MUU  M  Mitt  U 


V  0(1)  0(1)  0(1)  0(1)  0(1)  ©(l)  0(1)  0(1)  0(1)  0(1)  ...  0(1)  0(1)  0(1) 0(«k>*fa) 


©(n*0*'-") 


Llogi,"J-i 

Total:  0(n'°*'-‘’)+  ]*T  a>  f(nj) 

J- 0 


Figure  4.8  The  recursion  tree  generated  by  T  (n)  =  aT  (\n/b] )+ fin).  The  recursive  argument  nj 
is  given  by  equation  (4.27). 


Our  first  goal  is  to  determine  the  depth  k  such  that  nk  is  a  constant.  Using  the 
inequality  <  x  +  1,  we  obtain 


n0 

< 

n  , 

n  1 

< 

n 

b  +  l’ 

n2 

< 

n  1  , 

¥  +  b  +  l’ 

n3 

< 

n  1  1 

^+b^  +  ~b  +  l 

In  general,  we  have 
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tlj  <  — 

J  bJ 


< 


+  E^7 

i= 0 
oo  . 

yi  1 

bJ  +  b‘ 

1  =  0 

b 


bj  +  b  —  1 


Letting  j  =  [log;,  /?J ,  we  obtain 
,J  Llog*  nj  < 


«  z? 
+ 


< 


b  Llog*  nj  b  —  1 

»  h 

+ 


^)logi  n- 
n  b 

n/b  +  b  —  \ 


1 


=  Z>  + 


Z> 


b  -  1 


=  Od), 

and  thus  we  see  that  at  depth  [log*  n\ ,  the  problem  size  is  at  most  a  constant. 
From  Figure  4.8,  we  see  that 


|l°g6nj-l 

T(n)  =  &(nlos»a)  +  J2 

7=0 


(4.28) 


which  is  much  the  same  as  equation  (4.21),  except  that  n  is  an  arbitrary  integer  and 
not  restricted  to  be  an  exact  power  of  b. 

We  can  now  evaluate  the  summation 


Llogfe  n\—l 

g(n )  =  ^  °J /(”>)  (4-29) 

7=0 

from  equation  (4.28)  in  a  manner  analogous  to  the  proof  of  Lemma  4.3.  Beginning 
with  case  3,  if  af(\n/b])  <  cf(n)forn  >  b+b  /  (b—\),  where  c  <  1  is  a  constant, 
then  it  follows  that  a\f  (tij)  <  cj /(«).  Therefore,  we  can  evaluate  the  sum  in 
equation  (4.29)  just  as  in  Lemma  4.3.  For  case  2,  we  have  f(n)  =  0(«logA  a ).  If  we 
can  show  that  fifij)  =  0(nl°Sba  /aJ)  =  0((n/bJ)loSba),  then  the  proof  for  case  2 
of  Lemma  4.3  will  go  through.  Observe  that  j  <  [logft  //J  implies  b-’  / n  <  1.  The 
bound  f(n)  =  0(«log*“)  implies  that  there  exists  a  constant  c  >  0  such  that  for  all 
sufficiently  large  iij , 
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/(«/)  < 


< 


since  c(l  +  b/(b  —  l))log*a  is  a  constant.  Thus,  we  have  proved  case  2.  The  proof 
of  case  1  is  almost  identical.  The  key  is  to  prove  the  bound  f(rij)  =  0(nl°Sba~e), 
which  is  similar  to  the  corresponding  proof  of  case  2,  though  the  algebra  is  more 
intricate. 

We  have  now  proved  the  upper  bounds  in  the  master  theorem  for  all  integers  n . 
The  proof  of  the  lower  bounds  is  similar. 

Exercises 

4.6- 1  * 

Give  a  simple  and  exact  expression  for  rij  in  equation  (4.27)  for  the  case  in  which  b 
is  a  positive  integer  instead  of  an  arbitrary  real  number. 

4.6- 2  * 

Show  that  if  / (n)  =  0 (n log*  a  lg/f  /?),  where  k  >  0,  then  the  master  recurrence  has 
solution  T(n)  =  0(nlog*a  lg/c  1  1  n).  For  simplicity,  coniine  your  analysis  to  exact 
powers  of  b. 

4.6- 3  * 

Show  that  case  3  of  the  master  theorem  is  overstated,  in  the  sense  that  the  regularity 
condition  af(n/b)  <  cf{n)  for  some  constant  c  <  1  implies  that  there  exists  a 
constant  e  >  0  such  that  f(n)  =  f2(nlog*a+e). 


Problems  for  Chapter  4 
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Problems 


4-1  Recurrence  examples 

Give  asymptotic  upper  and  lower  bounds  for  Tin)  in  each  of  the  following  recur¬ 
rences.  Assume  that  T(n )  is  constant  for  n  <  2.  Make  your  bounds  as  tight  as 
possible,  and  justify  your  answers. 

a.  T(n)  =  2T(n/2)  +  n\ 

b.  T{n)  =  T(Jn/\0)  +  n. 

c.  T(n)  =  167\n/4)  +n2. 

d.  Tin)  =  IT {n IT)  +  n2. 

e.  T)n)  =  lTin/2)  +  n2. 

f  Tin)  =  2T(n/A)  + 

g.  Tin)  =  Tin  —  2)  +  n2. 

4-2  Parameter-passing  costs 

Throughout  this  book,  we  assume  that  parameter  passing  during  procedure  calls 
takes  constant  time,  even  if  an  A -element  array  is  being  passed.  This  assumption 
is  valid  in  most  systems  because  a  pointer  to  the  array  is  passed,  not  the  array  itself. 
This  problem  examines  the  implications  of  three  parameter-passing  strategies: 

1.  An  array  is  passed  by  pointer.  Time  =  0(1). 

2.  An  array  is  passed  by  copying.  Time  =  0(A),  where  A  is  the  size  of  the  array. 

3.  An  array  is  passed  by  copying  only  the  subrange  that  might  be  accessed  by  the 
called  procedure.  Time  =  &iq  —  p  +  1)  if  the  subarray  A[p  .  ,q\  is  passed. 

a.  Consider  the  recursive  binary  search  algorithm  for  finding  a  number  in  a  sorted 
array  (see  Exercise  2.3-5).  Give  recurrences  for  the  worst-case  running  times 
of  binary  search  when  arrays  are  passed  using  each  of  the  three  methods  above, 
and  give  good  upper  bounds  on  the  solutions  of  the  recurrences.  Let  A  be  the 
size  of  the  original  problem  and  n  be  the  size  of  a  subproblem. 

b.  Redo  part  (a)  for  the  Merge-Sort  algorithm  from  Section  2.3.1. 
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4-3  More  recurrence  examples 

Give  asymptotic  upper  and  lower  bounds  for  T(n)  in  each  of  the  following  recur¬ 
rences.  Assume  that  T(n)  is  constant  for  sufficiently  small  n.  Make  your  bounds 
as  tight  as  possible,  and  justify  your  answers. 

a.  T(n)  =  AT  in/ 3)  +  n  Ign. 

b.  T(n)  =  3T(n/3)  +  n/lgn. 

c.  T(n )  =  AT  in/ 2)  +  n2^/n. 

d.  T(n)  =  3T(n/3-2)  +  n/2. 

e.  T{n)  =  2T(n/2)  +  n/lgn. 

f.  T(n)  =  T(n/ 2)  +  Tin/ A)  +  T(n/S)  +  n. 

g.  T{n)  =  TQi-  1)  +  1  /n. 

h.  Tin)  =  Tin  —  1)  +  lg n. 

i.  T  in)  =  T(n  —  2)  +  1/  lg  n. 

j.  Tin)  =  y/nTiy/n)  +  n. 

4-4  Fibonacci  numbers 

This  problem  develops  properties  of  the  Fibonacci  numbers,  which  are  defined 
by  recurrence  (3.22).  We  shall  use  the  technique  of  generating  functions  to  solve 
the  Fibonacci  recurrence.  Define  the  generating  function  (or  formal  power  se¬ 
ries)  !F  as 

OO 

Viz)  =  £  FiZl 

i= 0 

=  o  +  z  +  z2 +  2Z3  +  3z4  +  5z5  +  8z6  +  13  z7  +  2  lz8  +  ■  ■  ■  , 
where  F;  is  the  i  th  Fibonacci  number. 

a.  Show  that  F iz)  =  z  +  zFiz)  +  z2Fiz). 
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b.  Show  that 

F(z)  = 


l  -  z-  z2 

z 


(l-ipz)(l  —  4>z) 

=  -U.  1 


V5  V  1  -  (pz  1  —<j>z 


where 


1  +  V5 

<j>  =  - —  =  1.61803... 

2 

and 

1  -  V5 

cj)  =  - —  =  -0.61803...  . 

r  2 

c.  Show  that 


i  ^ 

F(z)  =  • 

i=0  **5 

d.  Use  part  (c)  to  prove  that  F,  =  <p'  /  */5  for  i  >  0,  rounded  to  the  nearest  integer. 
{Hint:  Observe  that  r/>|  <  1.) 


4-5  Chip  testing 

Professor  Diogenes  has  n  supposedly  identical  integrated-circuit  chips  that  in  prin¬ 
ciple  are  capable  of  testing  each  other.  The  professor’s  test  jig  accommodates  two 
chips  at  a  time.  When  the  jig  is  loaded,  each  chip  tests  the  other  and  reports  whether 
it  is  good  or  bad.  A  good  chip  always  reports  accurately  whether  the  other  chip  is 
good  or  bad,  but  the  professor  cannot  trust  the  answer  of  a  bad  chip.  Thus,  the  four 
possible  outcomes  of  a  test  are  as  follows: 


Chip  A  says 
B  is  good 
B  is  good 
B  is  bad 
B  is  bad 


Chip  B  says 
A  is  good 
A  is  bad 
A  is  good 
A  is  bad 


Conclusion 

both  are  good,  or  both  are  bad 
at  least  one  is  bad 
at  least  one  is  bad 
at  least  one  is  bad 


a.  Show  that  if  more  than  n/2  chips  are  bad,  the  professor  cannot  necessarily  de¬ 
termine  which  chips  are  good  using  any  strategy  based  on  this  kind  of  pairwise 
test.  Assume  that  the  bad  chips  can  conspire  to  fool  the  professor. 
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b.  Consider  the  problem  of  finding  a  single  good  chip  from  among  n  chips,  as¬ 
suming  that  more  than  n/2  of  the  chips  are  good.  Show  that  [n/ 2\  pairwise 
tests  are  sufficient  to  reduce  the  problem  to  one  of  nearly  half  the  size. 

c.  Show  that  the  good  chips  can  be  identified  with  0(n)  pairwise  tests,  assuming 
that  more  than  n/2  of  the  chips  are  good.  Give  and  solve  the  recurrence  that 
describes  the  number  of  tests. 

4-6  Monge  arrays 

An  m  x  n  array  A  of  real  numbers  is  a  Monge  array  if  for  all  i,  j ,  k,  and  l  such 
that  1  <  i  <  k  <  m  and  I  <  j  <  l  <  n,  we  have 

A[i,  j ]  +  A[k ,  /]  <  A[i,  l ]  +  A[k ,  j]  . 

In  other  words,  whenever  we  pick  two  rows  and  two  columns  of  a  Monge  array  and 
consider  the  four  elements  at  the  intersections  of  the  rows  and  the  columns,  the  sum 
of  the  upper-left  and  lower-right  elements  is  less  than  or  equal  to  the  sum  of  the 
lower-left  and  upper-right  elements.  For  example,  the  following  array  is  Monge: 

10  17  13  28  23 

17  22  16  29  23 

24  28  22  34  24 

11  13  6  17  7 

45  44  32  37  23 

36  33  19  21  6 

75  66  51  53  34 

a.  Prove  that  an  array  is  Monge  if  and  only  if  for  all  i  =  1,2,  ...,m  —  1  and 
j  =  1,2, ...,  n  —  1,  we  have 

A[i,j]  +  A[i  +  1,7  +  1]  <  A[i,j  +  1]  +  A [i  +  1 ,  j  ]  . 

(Hint:  For  the  “if”  part,  use  induction  separately  on  rows  and  columns.) 

b.  The  following  array  is  not  Monge.  Change  one  element  in  order  to  make  it 
Monge.  (Hint:  Use  part  (a).) 

37  23  22  32 
21  6  7  10 

53  34  30  31 
32  13  9  6 

43  21  15  8 
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c.  Let  / (i )  be  the  index  of  the  column  containing  the  leftmost  minimum  element 
of  row  i.  Prove  that  /( 1)  <  /(2)  <  -  -  -  <  f(in)  for  any  in  x  n  Monge  array. 

d.  Here  is  a  description  of  a  divide-and-conquer  algorithm  that  computes  the  left¬ 
most  minimum  element  in  each  row  of  an  m  x  n  Monge  array  A: 

Construct  a  submatrix  A!  of  A  consisting  of  the  even-numbered  rows  of  A. 
Recursively  determine  the  leftmost  minimum  for  each  row  of  A'.  Then 
compute  the  leftmost  minimum  in  the  odd-numbered  rows  of  A. 

Explain  how  to  compute  the  leftmost  minimum  in  the  odd-numbered  rows  of  A 
(given  that  the  leftmost  minimum  of  the  even-numbered  rows  is  known)  in 
0(m  +  n)  time. 

e.  Write  the  recurrence  describing  the  running  time  of  the  algorithm  described  in 
part  (d).  Show  that  its  solution  is  0(m  +  n  log  m). 


Chapter  notes 

Divide-and-conquer  as  a  technique  for  designing  algorithms  dates  back  to  at  least 
1962  in  an  article  by  Karatsuba  and  Ofman  [194].  It  might  have  been  used  well  be¬ 
fore  then,  however;  according  to  Heideman,  Johnson,  and  Burrus  [163],  C.  F.  Gauss 
devised  the  first  fast  Fourier  transform  algorithm  in  1805,  and  Gauss’s  formulation 
breaks  the  problem  into  smaller  subproblems  whose  solutions  are  combined. 

The  maximum-subarray  problem  in  Section  4. 1  is  a  minor  variation  on  a  problem 
studied  by  Bentley  [43,  Chapter  7]. 

Strassen’s  algorithm  [325]  caused  much  excitement  when  it  was  published 
in  1969.  Before  then,  few  imagined  the  possibility  of  an  algorithm  asymptotically 
faster  than  the  basic  Square-Matrix-Multiply  procedure.  The  asymptotic 
upper  bound  for  matrix  multiplication  has  been  improved  since  then.  The  most 
asymptotically  efficient  algorithm  for  multiplying  n  x  n  matrices  to  date,  due  to 
Coppersmith  and  Winograd  [78],  has  a  running  time  of  0(n2376).  The  best  lower 
bound  known  is  just  the  obvious  £2(n2)  bound  (obvious  because  we  must  fill  in  n2 
elements  of  the  product  matrix). 

From  a  practical  point  of  view,  Strassen’s  algorithm  is  often  not  the  method  of 
choice  for  matrix  multiplication,  for  four  reasons: 

1.  The  constant  factor  hidden  in  the  0(«lg7)  running  time  of  Strassen’s  algo¬ 
rithm  is  larger  than  the  constant  factor  in  the  0(/r3)-time  Square-Matrix  - 
Multiply  procedure. 

2.  When  the  matrices  are  sparse,  methods  tailored  for  sparse  matrices  are  faster. 
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3.  Strassen’s  algorithm  is  not  quite  as  numerically  stable  as  Square-Matrix- 
Multiply.  In  other  words,  because  of  the  limited  precision  of  computer  arith¬ 
metic  on  noninteger  values,  larger  errors  accumulate  in  Strassen’s  algorithm 
than  in  Square-Matrix-Multiply. 

4.  The  submatrices  formed  at  the  levels  of  recursion  consume  space. 


The  latter  two  reasons  were  mitigated  around  1990.  Higham  [167]  demonstrated 
that  the  difference  in  numerical  stability  had  been  overemphasized;  although 
Strassen’s  algorithm  is  too  numerically  unstable  for  some  applications,  it  is  within 
acceptable  limits  for  others.  Bailey,  Lee,  and  Simon  [32]  discuss  techniques  for 
reducing  the  memory  requirements  for  Strassen’s  algorithm. 

In  practice,  fast  matrix-multiplication  implementations  for  dense  matrices  use 
Strassen’s  algorithm  for  matrix  sizes  above  a  “crossover  point,”  and  they  switch 
to  a  simpler  method  once  the  subproblem  size  reduces  to  below  the  crossover 
point.  The  exact  value  of  the  crossover  point  is  highly  system  dependent.  Analyses 
that  count  operations  but  ignore  effects  from  caches  and  pipelining  have  produced 
crossover  points  as  low  as  77  =  8  (by  Higham  [167])  or  n  =  12  (by  Huss-Lederman 
et  al.  [186]).  D’Alberto  and  Nicolau  [81]  developed  an  adaptive  scheme,  which 
determines  the  crossover  point  by  benchmarking  when  their  software  package  is 
installed.  They  found  crossover  points  on  various  systems  ranging  from  n  =  400 
to  n  =  2 1 50,  and  they  could  not  find  a  crossover  point  on  a  couple  of  systems. 

Recurrences  were  studied  as  early  as  1202  by  L.  Fibonacci,  for  whom  the  Fi¬ 
bonacci  numbers  are  named.  A.  De  Moivre  introduced  the  method  of  generating 
functions  (see  Problem  4-4)  for  solving  recurrences.  The  master  method  is  adapted 
from  Bentley,  Haken,  and  Saxe  [44],  which  provides  the  extended  method  justified 
by  Exercise  4.6-2.  Knuth  [209]  and  Liu  [237]  show  how  to  solve  linear  recurrences 
using  the  method  of  generating  functions.  Purdom  and  Brown  [287]  and  Graham, 
Knuth,  and  Patashnik  [152]  contain  extended  discussions  of  recurrence  solving. 

Several  researchers,  including  Akra  and  Bazzi  [13],  Roura  [299],  Verma  [346], 
and  Yap  [360],  have  given  methods  for  solving  more  general  divide-and-conquer 
recurrences  than  are  solved  by  the  master  method.  We  describe  the  result  of  Akra 
and  Bazzi  here,  as  modified  by  Leighton  [228].  The  Akra-Bazzi  method  works  for 
recurrences  of  the  form 


T  ,  _  (  ©(1)  if  1  <  x  <  x0  , 

\  E/  =  l  ai  T (bix)  +  /(*)  if  *  >  *0  > 

where 


(4.30) 


•  x  >  1  is  a  real  number, 

•  x0  is  a  constant  such  that  x0  >  1  /bj  and  x0  >  1/(1  —  bf)  for  i  =  1,2, ...  ,k, 

•  dj  is  a  positive  constant  for  i  =  1,2, ...  ,k, 
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•  bj  is  a  constant  in  the  range  0  <  bt  <  1  for  i  =  1,2, ...  ,k, 

•  k  >  1  is  an  integer  constant,  and 

•  fix)  is  a  nonnegative  function  that  satisfies  the  polynomial-growth  condi¬ 
tion'.  there  exist  positive  constants  C\  and  c2  such  that  for  all  x  >  1,  for 
i  =  1,2, ...  ,k,  and  for  all  u  such  that  b,x  <  u  <  x,  we  have  C\f{x)  < 
f{u)  <  ('2  fix).  (If  |  /  ' (x )  |  is  upper-bounded  by  some  polynomial  in  x,  then 
/ (x)  satisfies  the  polynomial-growth  condition.  For  example,  / (x)  =  x“  lg/"  x 
satisfies  this  condition  for  any  real  constants  a  and  j3.) 

Although  the  master  method  does  not  apply  to  a  recurrence  such  as  T{n)  = 
T([n/3\)  +  T([2n/3\)  +  0{n),  the  Akra-Bazzi  method  does.  To  solve  the  re¬ 
currence  (4.30),  we  first  find  the  unique  real  number  p  such  that  Xlf=i  aitf  =  1. 
(Such  a  p  always  exists.)  The  solution  to  the  recurrence  is  then 


The  Akra-Bazzi  method  can  be  somewhat  difficult  to  use,  but  it  serves  in  solving 
recurrences  that  model  division  of  the  problem  into  substantially  unequally  sized 
subproblems.  The  master  method  is  simpler  to  use,  but  it  applies  only  when  sub¬ 
problem  sizes  are  equal. 
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This  chapter  introduces  probabilistic  analysis  and  randomized  algorithms.  If  you 
are  unfamiliar  with  the  basics  of  probability  theory,  you  should  read  Appendix  C, 
which  reviews  this  material.  We  shall  revisit  probabilistic  analysis  and  randomized 
algorithms  several  times  throughout  this  book. 


5.1  The  hiring  problem 

Suppose  that  you  need  to  hire  a  new  office  assistant.  Your  previous  attempts  at 
hiring  have  been  unsuccessful,  and  you  decide  to  use  an  employment  agency.  The 
employment  agency  sends  you  one  candidate  each  day.  You  interview  that  person 
and  then  decide  either  to  hire  that  person  or  not.  You  must  pay  the  employment 
agency  a  small  fee  to  interview  an  applicant.  To  actually  hire  an  applicant  is  more 
costly,  however,  since  you  must  fire  your  current  office  assistant  and  pay  a  substan¬ 
tial  hiring  fee  to  the  employment  agency.  You  are  committed  to  having,  at  all  times, 
the  best  possible  person  for  the  job.  Therefore,  you  decide  that,  after  interviewing 
each  applicant,  if  that  applicant  is  better  qualified  than  the  current  office  assistant, 
you  will  fire  the  current  office  assistant  and  hire  the  new  applicant.  You  are  willing 
to  pay  the  resulting  price  of  this  strategy,  but  you  wish  to  estimate  what  that  price 
will  be. 

The  procedure  Hire-Assistant,  given  below,  expresses  this  strategy  for  hiring 
in  pseudocode.  It  assumes  that  the  candidates  for  the  office  assistant  job  are  num¬ 
bered  1  through  n.  The  procedure  assumes  that  you  are  able  to,  after  interviewing 
candidate  i ,  determine  whether  candidate  i  is  the  best  candidate  you  have  seen  so 
far.  To  initialize,  the  procedure  creates  a  dummy  candidate,  numbered  0,  who  is 
less  qualified  than  each  of  the  other  candidates. 
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Hire-Assistant  (n) 

1  best  =  0  //  candidate  0  is  a  least-qualified  dummy  candidate 

2  for  i  =  1  to  n 

3  interview  candidate  i 

4  if  candidate  i  is  better  than  candidate  best 

5  best  =  i 

6  hire  candidate  i 

The  cost  model  for  this  problem  differs  from  the  model  described  in  Chapter  2. 
We  focus  not  on  the  running  time  of  HIRE-ASSISTANT,  but  instead  on  the  costs 
incurred  by  interviewing  and  hiring.  On  the  surface,  analyzing  the  cost  of  this  algo¬ 
rithm  may  seem  very  different  from  analyzing  the  running  time  of,  say,  merge  sort. 
The  analytical  techniques  used,  however,  are  identical  whether  we  are  analyzing 
cost  or  running  time.  In  either  case,  we  are  counting  the  number  of  times  certain 
basic  operations  are  executed. 

Interviewing  has  a  low  cost,  say  c,- ,  whereas  hiring  is  expensive,  costing  Ch-  Let¬ 
ting  m  be  the  number  of  people  hired,  the  total  cost  associated  with  this  algorithm 
is  0(cii i  +  Chin).  No  matter  how  many  people  we  hire,  we  always  interview  n 
candidates  and  thus  always  incur  the  cost  c,  n  associated  with  interviewing.  We 
therefore  concentrate  on  analyzing  c^m,  the  hiring  cost.  This  quantity  varies  with 
each  run  of  the  algorithm. 

This  scenario  serves  as  a  model  for  a  common  computational  paradigm.  We  of¬ 
ten  need  to  find  the  maximum  or  minimum  value  in  a  sequence  by  examining  each 
element  of  the  sequence  and  maintaining  a  current  “winner.”  The  hiring  problem 
models  how  often  we  update  our  notion  of  which  element  is  currently  winning. 

Worst-case  analysis 

In  the  worst  case,  we  actually  hire  every  candidate  that  we  interview.  This  situation 
occurs  if  the  candidates  come  in  strictly  increasing  order  of  quality,  in  which  case 
we  hire  n  times,  for  a  total  hiring  cost  of  O(Chii). 

Of  course,  the  candidates  do  not  always  come  in  increasing  order  of  quality.  In 
fact,  we  have  no  idea  about  the  order  in  which  they  arrive,  nor  do  we  have  any 
control  over  this  order.  Therefore,  it  is  natural  to  ask  what  we  expect  to  happen  in 
a  typical  or  average  case. 

Probabilistic  analysis 

Probabilistic  analysis  is  the  use  of  probability  in  the  analysis  of  problems.  Most 
commonly,  we  use  probabilistic  analysis  to  analyze  the  running  time  of  an  algo¬ 
rithm.  Sometimes  we  use  it  to  analyze  other  quantities,  such  as  the  hiring  cost 
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in  procedure  Hire-Assistant.  In  order  to  perform  a  probabilistic  analysis,  we 
must  use  knowledge  of,  or  make  assumptions  about,  the  distribution  of  the  inputs. 
Then  we  analyze  our  algorithm,  computing  an  average-case  running  time,  where 
we  take  the  average  over  the  distribution  of  the  possible  inputs.  Thus  we  are,  in 
effect,  averaging  the  running  time  over  all  possible  inputs.  When  reporting  such  a 
running  time,  we  will  refer  to  it  as  the  average-case  running  time. 

We  must  be  very  care ful  in  deciding  on  the  distribution  of  inputs.  For  some 
problems,  we  may  reasonably  assume  something  about  the  set  of  all  possible  in¬ 
puts,  and  then  we  can  use  probabilistic  analysis  as  a  technique  for  designing  an 
efficient  algorithm  and  as  a  means  for  gaining  insight  into  a  problem.  For  other 
problems,  we  cannot  describe  a  reasonable  input  distribution,  and  in  these  cases 
we  cannot  use  probabilistic  analysis. 

For  the  hiring  problem,  we  can  assume  that  the  applicants  come  in  a  random 
order.  What  does  that  mean  for  this  problem?  We  assume  that  we  can  compare 
any  two  candidates  and  decide  which  one  is  better  qualified;  that  is,  there  is  a 
total  order  on  the  candidates.  (See  Appendix  B  for  the  definition  of  a  total  or¬ 
der.)  Thus,  we  can  rank  each  candidate  with  a  unique  number  from  1  through  n, 
using  rank(i)  to  denote  the  rank  of  applicant  i,  and  adopt  the  convention  that  a 
higher  rank  corresponds  to  a  better  qualified  applicant.  The  ordered  list  (rank(  1), 
rank( 2), . . . ,  rank(n))  is  a  permutation  of  the  list  (1,2,  ...,«).  Saying  that  the 
applicants  come  in  a  random  order  is  equivalent  to  saying  that  this  list  of  ranks  is 
equally  likely  to  be  any  one  of  the  n  \  permutations  of  the  numbers  1  through  n. 
Alternatively,  we  say  that  the  ranks  form  a  uniform  random  permutation',  that  is, 
each  of  the  possible  n !  permutations  appears  with  equal  probability. 

Section  5.2  contains  a  probabilistic  analysis  of  the  hiring  problem. 

Randomized  algorithms 

In  order  to  use  probabilistic  analysis,  we  need  to  know  something  about  the  distri¬ 
bution  of  the  inputs.  In  many  cases,  we  know  very  little  about  the  input  distribution. 
Even  if  we  do  know  something  about  the  distribution,  we  may  not  be  able  to  model 
this  knowledge  computationally.  Yet  we  often  can  use  probability  and  randomness 
as  a  tool  for  algorithm  design  and  analysis,  by  making  the  behavior  of  part  of  the 
algorithm  random. 

In  the  hiring  problem,  it  may  seem  as  if  the  candidates  are  being  presented  to  us 
in  a  random  order,  but  we  have  no  way  of  knowing  whether  or  not  they  really  are. 
Thus,  in  order  to  develop  a  randomized  algorithm  for  the  hiring  problem,  we  must 
have  greater  control  over  the  order  in  which  we  interview  the  candidates.  We  will, 
therefore,  change  the  model  slightly.  We  say  that  the  employment  agency  has  n 
candidates,  and  they  send  us  a  list  of  the  candidates  in  advance.  On  each  day,  we 
choose,  randomly,  which  candidate  to  interview.  Although  we  know  nothing  about 
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the  candidates  (besides  their  names),  we  have  made  a  significant  change.  Instead 
of  relying  on  a  guess  that  the  candidates  come  to  us  in  a  random  order,  we  have 
instead  gained  control  of  the  process  and  enforced  a  random  order. 

More  generally,  we  call  an  algorithm  randomized  if  its  behavior  is  determined 
not  only  by  its  input  but  also  by  values  produced  by  a  random-number  gener¬ 
ator.  We  shall  assume  that  we  have  at  our  disposal  a  random-number  generator 
Random.  A  call  to  Random)#,  b)  returns  an  integer  between  a  and  b,  inclu¬ 
sive,  with  each  such  integer  being  equally  likely.  For  example,  Random(0,  1) 
produces  0  with  probability  1/2,  and  it  produces  1  with  probability  1  /2.  A  call  to 
Random(3,  7)  returns  either  3,  4,  5,  6,  or  7,  each  with  probability  1/5.  Each  inte¬ 
ger  returned  by  Random  is  independent  of  the  integers  returned  on  previous  calls. 
You  may  imagine  Random  as  rolling  a  (b  —  a  +  l)-sided  die  to  obtain  its  out¬ 
put.  (In  practice,  most  programming  environments  offer  a  pseudorandom-number 
generator,  a  deterministic  algorithm  returning  numbers  that  “look”  statistically 
random.) 

When  analyzing  the  running  time  of  a  randomized  algorithm,  we  take  the  expec¬ 
tation  of  the  running  time  over  the  distribution  of  values  returned  by  the  random 
number  generator.  We  distinguish  these  algorithms  from  those  in  which  the  input 
is  random  by  referring  to  the  running  time  of  a  randomized  algorithm  as  an  ex¬ 
pected  running  time.  In  general,  we  discuss  the  average-case  running  time  when 
the  probability  distribution  is  over  the  inputs  to  the  algorithm,  and  we  discuss  the 
expected  running  time  when  the  algorithm  itself  makes  random  choices. 

Exercises 


5.1- 1 

Show  that  the  assumption  that  we  are  always  able  to  determine  which  candidate  is 
best,  in  line  4  of  procedure  Hire-Assistant,  implies  that  we  know  a  total  order 
on  the  ranks  of  the  candidates. 

5.1- 2  * 

Describe  an  implementation  of  the  procedure  Random  (a,  (?)  that  only  makes  calls 
to  Random(0,  1).  What  is  the  expected  running  time  of  your  procedure,  as  a 
function  of  a  and  b  ? 

5.1- 3  * 

Suppose  that  you  want  to  output  0  with  probability  1/2  and  1  with  probability  1/2. 
At  your  disposal  is  a  procedure  Biased-Random,  that  outputs  either  0  or  1.  It 
outputs  1  with  some  probability  p  and  0  with  probability  1  —  p,  where  0  <  p  <  1, 
but  you  do  not  know  what  p  is.  Give  an  algorithm  that  uses  Biased-Random 
as  a  subroutine,  and  returns  an  unbiased  answer,  returning  0  with  probability  1/2 
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and  1  with  probability  1/2.  What  is  the  expected  running  time  of  your  algorithm 
as  a  function  of  pi 


5.2  Indicator  random  variables 


In  order  to  analyze  many  algorithms,  including  the  hiring  problem,  we  use  indicator 
random  variables.  Indicator  random  variables  provide  a  convenient  method  for 
converting  between  probabilities  and  expectations.  Suppose  we  are  given  a  sample 
space  S  and  an  event  A.  Then  the  indicator  random  variable  I  {,4}  associated  with 
event  A  is  defined  as 

1  if  A  occurs  , 

(5.1) 

0  if  A  does  not  occur  . 

As  a  simple  example,  let  us  determine  the  expected  number  of  heads  that  we 
obtain  when  flipping  a  fair  coin.  Our  sample  space  is  S  =  {H,  T},  with  Pr  {H}  = 
Pr{T}  =  1/2.  We  can  then  define  an  indicator  random  variable  XH,  associated 
with  the  coin  coming  up  heads,  which  is  the  event  H .  This  variable  counts  the 
number  of  heads  obtained  in  this  flip,  and  it  is  1  if  the  coin  comes  up  heads  and  0 
otherwise.  We  write 


XH  =  I  {H} 

1  if  H  occurs  , 

0  if  T  occurs  . 

The  expected  number  of  heads  obtained  in  one  flip  of  the  coin  is  simply  the  ex¬ 
pected  value  of  our  indicator  variable  XH : 

E[Xh]  =  E  [I  {//}] 

=  1  Pr{fT}  +  0-Pr{T} 

=  1  -  (1/2)  +  0-  (1/2) 

=  1/2. 

Thus  the  expected  number  of  heads  obtained  by  one  flip  of  a  fan-  coin  is  1/2.  As 
the  following  lemma  shows,  the  expected  value  of  an  indicator  random  variable 
associated  with  an  event  A  is  equal  to  the  probability  that  A  occurs. 

Lemma  5.1 

Given  a  sample  space  S  and  an  event  A  in  the  sample  space  S,  let  XA  =  I  {A}. 
Then  E  [XA]  =  Pr  {A}. 
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Proof  By  the  definition  of  an  indicator  random  variable  from  equation  (5.1)  and 
the  definition  of  expected  value,  we  have 

E[Xa]  =  E[I{A}] 

=  1  -Pr{A}  +  0  -  Pr  {^4} 

=  Pr{A}  , 

where  A  denotes  S  —  A,  the  complement  of  A.  m 


Although  indicator  random  variables  may  seem  cumbersome  for  an  application 
such  as  counting  the  expected  number  of  heads  on  a  flip  of  a  single  coin,  they  are 
useful  for  analyzing  situations  in  which  we  perform  repeated  random  trials.  For 
example,  indicator  random  variables  give  us  a  simple  way  to  arrive  at  the  result 
of  equation  (C.37).  In  this  equation,  we  compute  the  number  of  heads  in  n  coin 
flips  by  considering  separately  the  probability  of  obtaining  0  heads,  1  head,  2  heads, 
etc.  The  simpler  method  proposed  in  equation  (C.38)  instead  uses  indicator  random 
variables  implicitly.  Making  this  argument  more  explicit,  we  let  Xj  be  the  indicator 
random  variable  associated  with  the  event  in  which  the  zth  flip  comes  up  heads: 
Xj  =  I  {the  zth  flip  results  in  the  event  H ).  Let  X  be  the  random  variable  denoting 
the  total  number  of  heads  in  the  zz  coin  flips,  so  that 


n 

*=i>- 

Z  =  1 

We  wish  to  compute  the  expected  number  of  heads,  and  so  we  take  the  expectation 
of  both  sides  of  the  above  equation  to  obtain 


E  [X]  =  E 


E* 


_;  =  1 


The  above  equation  gives  the  expectation  of  the  sum  of  zz  indicator  random  vari¬ 
ables.  By  Lemma  5. 1 ,  we  can  easily  compute  the  expectation  of  each  of  the  random 
variables.  By  equation  (C. 21)— linearity  of  expectation— it  is  easy  to  compute  the 
expectation  of  the  sum:  it  equals  the  sum  of  the  expectations  of  the  zz  random 
variables.  Linearity  of  expectation  makes  the  use  of  indicator  random  variables  a 
powerful  analytical  technique;  it  applies  even  when  there  is  dependence  among  the 
random  variables.  We  now  can  easily  compute  the  expected  number  of  heads: 
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E[X] 


E 


E* 


i  =  1 
n 

E'/2 

7  =  1 

n/2  . 


Thus,  compared  to  the  method  used  in  equation  (C.37),  indicator  random  variables 
greatly  simplify  the  calculation.  We  shall  use  indicator  random  variables  through¬ 
out  this  book. 


Analysis  of  the  hiring  problem  using  indicator  random  variables 

Returning  to  the  hiring  problem,  we  now  wish  to  compute  the  expected  number  of 
times  that  we  hire  a  new  office  assistant.  In  order  to  use  a  probabilistic  analysis,  we 
assume  that  the  candidates  arrive  in  a  random  order,  as  discussed  in  the  previous 
section.  (We  shall  see  in  Section  5.3  how  to  remove  this  assumption.)  Let  X  be  the 
random  variable  whose  value  equals  the  number  of  times  we  hire  a  new  office  as¬ 
sistant.  We  could  then  apply  the  definition  of  expected  value  from  equation  (C.20) 
to  obtain 

n 

E  [X]  =  J2X  pr{A  =  , 

X=l 

but  this  calculation  would  be  cumbersome.  We  shall  instead  use  indicator  random 
variables  to  greatly  simplify  the  calculation. 

To  use  indicator  random  variables,  instead  of  computing  E  [A]  by  defining  one 
variable  associated  with  the  number  of  times  we  hire  a  new  office  assistant,  we 
define  n  variables  related  to  whether  or  not  each  particular  candidate  is  hired.  In 
particular,  we  let  A,  be  the  indicator  random  variable  associated  with  the  event  in 
which  the  /  th  candidate  is  hired.  Thus, 

A,;  =  I  {candidate  i  is  hired} 

1  if  candidate  i  is  hired  , 

0  if  candidate  i  is  not  hired  , 

and 


A  —  Aj  +  X2  +  •  •  •  +  Xn  . 


(5.2) 
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By  Lemma  5.1,  we  have  that 
E  [Xi\  =  Pr  {candidate  i  is  hired}  , 

and  we  must  therefore  compute  the  probability  that  lines  5-6  of  Hire-Assistant 
are  executed. 

Candidate  i  is  hired,  in  line  6,  exactly  when  candidate  i  is  better  than  each  of 
candidates  1  through  i  —  1 .  Because  we  have  assumed  that  the  candidates  arrive  in 
a  random  order,  the  first  i  candidates  have  appeared  in  a  random  order.  Any  one  of 
these  first  i  candidates  is  equally  likely  to  be  the  best-qualified  so  far.  Candidate  i 
has  a  probability  of  1  /  i  of  being  better  qualified  than  candidates  1  through  i  —  1 
and  thus  a  probability  of  1/z  of  being  hired.  By  Lemma  5.1,  we  conclude  that 

E[Xi]  =  l/i.  (5.3) 

Now  we  can  compute  E  [A]: 


n 


1 

>< 

ws 

_ i 

m 

(by  equation  (5.2)) 

(5.4) 

n 

!>[*'] 

/=i 

(by  linearity  of  expectation) 

n 

Xja 

; _ i 

(by  equation  (5.3)) 

l  —  I 

Inn  +  0(1) 

(by  equation  (A.7))  . 

(5.5) 

Even  though  we  interview  n  people,  we  actually  hire  only  approximately  In  n  of 
them,  on  average.  We  summarize  this  result  in  the  following  lemma. 

Lemma  5.2 

Assuming  that  the  candidates  are  presented  in  a  random  order,  algorithm  Hire- 
ASSISTANT  has  an  average-case  total  hiring  cost  of  0(ch  Inn). 

Proof  The  bound  follows  immediately  from  our  definition  of  the  hiring  cost 
and  equation  (5.5),  which  shows  that  the  expected  number  of  hires  is  approxi¬ 
mately  ln/7.  ■ 

The  average-case  hiring  cost  is  a  significant  improvement  over  the  worst-case 
hiring  cost  of  0(chti). 
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Exercises 


5.2-1 

In  Hire-Assistant,  assuming  that  the  candidates  are  presented  in  a  random  or¬ 
der,  what  is  the  probability  that  you  hire  exactly  one  time?  What  is  the  probability 
that  you  hire  exactly  n  times? 


5.2-2 

In  Hire-Assistant,  assuming  that  the  candidates  are  presented  in  a  random  or¬ 
der,  what  is  the  probability  that  you  hire  exactly  twice? 


5.2-3 

Use  indicator  random  variables  to  compute  the  expected  value  of  the  sum  of  n  dice. 


5.2-4 

Use  indicator  random  variables  to  solve  the  following  problem,  which  is  known  as 
the  hat-check  problem.  Each  of  n  customers  gives  a  hat  to  a  hat-check  person  at  a 
restaurant.  The  hat-check  person  gives  the  hats  back  to  the  customers  in  a  random 
order.  What  is  the  expected  number  of  customers  who  get  back  their  own  hat? 


5.2-5 

Let  A[  1  . .  n\  be  an  array  of  n  distinct  numbers.  If  i  <  j  and  A[i]  >  A \j ] ,  then 
the  pair  (i,  j )  is  called  an  inversion  of  A.  (See  Problem  2-4  for  more  on  inver¬ 
sions.)  Suppose  that  the  elements  of  A  form  a  uniform  random  permutation  of 
(1,2 Use  indicator  random  variables  to  compute  the  expected  number  of 
inversions. 


5.3  Randomized  algorithms 

In  the  previous  section,  we  showed  how  knowing  a  distribution  on  the  inputs  can 
help  us  to  analyze  the  average-case  behavior  of  an  algorithm.  Many  times,  we  do 
not  have  such  knowledge,  thus  precluding  an  average-case  analysis.  As  mentioned 
in  Section  5.1,  we  may  be  able  to  use  a  randomized  algorithm. 

For  a  problem  such  as  the  hiring  problem,  in  which  it  is  helpful  to  assume  that 
all  permutations  of  the  input  are  equally  likely,  a  probabilistic  analysis  can  guide 
the  development  of  a  randomized  algorithm.  Instead  of  assuming  a  distribution 
of  inputs,  we  impose  a  distribution.  In  particular,  before  running  the  algorithm, 
we  randomly  permute  the  candidates  in  order  to  enforce  the  property  that  every 
permutation  is  equally  likely.  Although  we  have  modified  the  algorithm,  we  still 
expect  to  hire  a  new  office  assistant  approximately  I n  n  times.  But  now  we  expect 
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this  to  be  the  case  for  any  input,  rather  than  for  inputs  drawn  from  a  particular 
distribution. 

Let  us  further  explore  the  distinction  between  probabilistic  analysis  and  random¬ 
ized  algorithms.  In  Section  5.2,  we  claimed  that,  assuming  that  the  candidates  ar¬ 
rive  in  a  random  order,  the  expected  number  of  times  we  hire  a  new  office  assistant 
is  about  In/;.  Note  that  the  algorithm  here  is  deterministic;  for  any  particular  input, 
the  number  of  times  a  new  office  assistant  is  hired  is  always  the  same.  Furthermore, 
the  number  of  times  we  hire  a  new  office  assistant  differs  for  different  inputs,  and  it 
depends  on  the  ranks  of  the  various  candidates.  Since  this  number  depends  only  on 
the  ranks  of  the  candidates,  we  can  represent  a  particular  input  by  listing,  in  order, 
the  ranks  of  the  candidates,  i.e.,  (rank(  1),  rank{ 2),  . . . ,  rank(n )).  Given  the  rank 
list  A\  =  (1, 2, 3, 4, 5, 6, 7,  8,  9, 10),  a  new  office  assistant  is  always  hired  10  times, 
since  each  successive  candidate  is  better  than  the  previous  one,  and  lines  5-6  are 
executed  in  each  iteration.  Given  the  list  of  ranks  A2  =  (10,  9,  8, 7,  6,  5,  4,3,2,  1), 
a  new  office  assistant  is  hired  only  once,  in  the  first  iteration.  Given  a  list  of  ranks 
ri3  =  (5,  2,  1,  8,  4,  7,  10,  9,  3,  6),  a  new  office  assistant  is  hired  three  times, 
upon  interviewing  the  candidates  with  ranks  5,  8,  and  10.  Recalling  that  the  cost 
of  our  algorithm  depends  on  how  many  times  we  hire  a  new  office  assistant,  we 
see  that  there  are  expensive  inputs  such  as  A1?  inexpensive  inputs  such  as  A2,  and 
moderately  expensive  inputs  such  as  A3. 

Consider,  on  the  other  hand,  the  randomized  algorithm  that  first  permutes  the 
candidates  and  then  determines  the  best  candidate.  In  this  case,  we  randomize  in 
the  algorithm,  not  in  the  input  distribution.  Given  a  particular  input,  say  A3  above, 
we  cannot  say  how  many  times  the  maximum  is  updated,  because  this  quantity 
differs  with  each  run  of  the  algorithm.  The  first  time  we  run  the  algorithm  on  A3, 
it  may  produce  the  permutation  A1  and  perform  10  updates;  but  the  second  time 
we  run  the  algorithm,  we  may  produce  the  permutation  A2  and  perform  only  one 
update.  The  third  time  we  run  it,  we  may  perform  some  other  number  of  updates. 
Each  time  we  run  the  algorithm,  the  execution  depends  on  the  random  choices 
made  and  is  likely  to  differ  from  the  previous  execution  of  the  algorithm.  For  this 
algorithm  and  many  other  randomized  algorithms,  no  particular  input  elicits  its 
worst-case  behavior.  Even  your  worst  enemy  cannot  produce  a  bad  input  array, 
since  the  random  permutation  makes  the  input  order  irrelevant.  The  randomized 
algorithm  performs  badly  only  if  the  random-number  generator  produces  an  “un¬ 
lucky”  permutation. 

For  the  hiring  problem,  the  only  change  needed  in  the  code  is  to  randomly  per¬ 
mute  the  array. 
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Randomized-Hire- Assistant  (zz) 

1  randomly  permute  the  list  of  candidates 

2  best  =  0  //  candidate  0  is  a  least-qualified  dummy  candidate 

3  for  i  =  1  to  n 

4  interview  candidate  i 

5  if  candidate  i  is  better  than  candidate  best 

6  best  =  i 

7  hire  candidate  i 

With  this  simple  change,  we  have  created  a  randomized  algorithm  whose  perfor¬ 
mance  matches  that  obtained  by  assuming  that  the  candidates  were  presented  in  a 
random  order. 

Lemma  5.3 

The  expected  hiring  cost  of  the  procedure  Randomized-Hire- Assistant  is 
0{ch  ln/z). 

Proof  After  permuting  the  input  array,  we  have  achieved  a  situation  identical  to 
that  of  the  probabilistic  analysis  of  Hire-Assistant.  ■ 

Comparing  Lemmas  5.2  and  5.3  highlights  the  difference  between  probabilistic 
analysis  and  randomized  algorithms.  In  Lemma  5.2,  we  make  an  assumption  about 
the  input.  In  Lemma  5.3,  we  make  no  such  assumption,  although  randomizing  the 
input  takes  some  additional  time.  To  remain  consistent  with  our  terminology,  we 
couched  Lemma  5.2  in  terms  of  the  average-case  hiring  cost  and  Lemma  5.3  in 
terms  of  the  expected  hiring  cost.  In  the  remainder  of  this  section,  we  discuss  some 
issues  involved  in  randomly  permuting  inputs. 

Randomly  permuting  arrays 

Many  randomized  algorithms  randomize  the  input  by  permuting  the  given  input 
array.  (There  are  other  ways  to  use  randomization.)  Here,  we  shall  discuss  two 
methods  for  doing  so.  We  assume  that  we  are  given  an  array  A  which,  without  loss 
of  generality,  contains  the  elements  1  through  n .  Our  goal  is  to  produce  a  random 
permutation  of  the  array. 

One  common  method  is  to  assign  each  element  A  [z  ]  of  the  array  a  random  pri¬ 
ority  P  [i  ] ,  and  then  sort  the  elements  of  A  according  to  these  priorities.  For  ex¬ 
ample,  if  our  initial  array  is  A  =  (1,  2,  3,  4)  and  we  choose  random  priorities 
P  =  (36,  3,  62, 19),  we  would  produce  an  array  B  =  (2,  4,  1, 3),  since  the  second 
priority  is  the  smallest,  followed  by  the  fourth,  then  the  first,  and  finally  the  third. 
We  call  this  procedure  Permute-By-Sorting: 


5.3  Randomized  algorithms 


125 


PERMUTE-B  Y-SORTING  (A) 

1  n  =  A.  length 

2  let  P  [1  . .  n]  be  a  new  array 

3  for  i  =  1  to  n 

4  P[i]  =  Random(1,«3) 

5  sort  A,  using  P  as  sort  keys 

Line  4  chooses  a  random  number  between  1  and  n3.  We  use  a  range  of  1  to  n3 
to  make  it  likely  that  all  the  priorities  in  P  are  unique.  (Exercise  5.3-5  asks  you 
to  prove  that  the  probability  that  all  entries  are  unique  is  at  least  1  —  l/n,  and 
Exercise  5.3-6  asks  how  to  implement  the  algorithm  even  if  two  or  more  priorities 
are  identical.)  Let  us  assume  that  all  the  priorities  are  unique. 

The  time-consuming  step  in  this  procedure  is  the  sorting  in  line  5.  As  we  shall 
see  in  Chapter  8,  if  we  use  a  comparison  sort,  sorting  takes  Q(nlgn)  time.  We 
can  achieve  this  lower  bound,  since  we  have  seen  that  merge  sort  takes  0(n  Ig  n) 
time.  (We  shall  see  other  comparison  sorts  that  take  &(nlgn)  time  in  Part  II. 
Exercise  8.3-4  asks  you  to  solve  the  very  similar  problem  of  sorting  numbers  in  the 
range  0  to  n3  —  1  in  O(n)  time.)  After  sorting,  if  P[i]  is  the  j  th  smallest  priority, 
then  A[i]  lies  in  position  j  of  the  output.  In  this  manner  we  obtain  a  permutation.  It 
remains  to  prove  that  the  procedure  produces  a  uniform  random  permutation,  that 
is,  that  the  procedure  is  equally  likely  to  produce  every  permutation  of  the  numbers 
1  through  n . 

Lemma  5.4 

Procedure  Permute-by-Sorting  produces  a  uniform  random  permutation  of  the 
input,  assuming  that  all  priorities  are  distinct. 

Proof  We  start  by  considering  the  particular  permutation  in  which  each  ele¬ 
ment  A  [i  ]  receives  the  i  th  smallest  priority.  We  shall  show  that  this  permutation 
occurs  with  probability  exactly  \/n\.  For  i  =  1,2,...,/?,  let  E,  be  the  event 
that  element  A  [i  ]  receives  the  i  th  smallest  priority.  Then  we  wish  to  compute  the 
probability  that  for  all  i ,  event  Et  occurs,  which  is 

Pr{£!  n  e2  n  £3  n  •••  n  n  En}  . 

Using  Exercise  C.2-5,  this  probability  is  equal  to 

Pr  { £■ ! }  ■  Pr{.E2  I  £1}  ■  Pr  {E3  \  E2  n  Ei}  ■  Pr  {E4  \  E3  D  E2  n  £1} 

■  ■  ■  Pr  {E,  |  £,_i  D  Ei-2  n  ■■■  n  Ex}  ■  ■  ■  Pr  {En  \  En-\  n  •  ■  ■  n  Ei)  . 

We  have  that  Pr  { iT ! }  =  1//7  because  it  is  the  probability  that  one  priority 
chosen  randomly  out  of  a  set  of  n  is  the  smallest  priority.  Next,  we  observe 
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that  Pr {E2  |  Ei}  =  \ /{n  —  1 )  because  given  that  element  /l[l]  has  the  small¬ 
est  priority,  each  of  the  remaining  n  —  1  elements  has  an  equal  chance  of  hav¬ 
ing  the  second  smallest  priority.  In  general,  for  i  =  2,3, ...  ,n,  we  have  that 
Pr{.E,  |  £',■_!  fl  Ej-2  H  ■  ■  ■  n  Ex}  =  \/(n  —i  +  1),  since,  given  that  elements  A[l] 
through  A  [i  —  1]  have  the  i  —  1  smallest  priorities  (in  order),  each  of  the  remaining 
n  —  (i  —  1)  elements  has  an  equal  chance  of  having  the  /th  smallest  priority.  Thus, 
we  have 


Pr{T1n£2n£3n---n  £„_i  n  En) 


l 

n\ 


and  we  have  shown  that  the  probability  of  obtaining  the  identity  permutation 
is  \/n\. 

We  can  extend  this  proof  to  work  for  any  permutation  of  priorities.  Consider 
any  fixed  permutation  a  =  (a(l),  <r(2), . . . ,  ct(«))  of  the  set  {1, 2, . . . Let  us 
denote  by  r,  the  rank  of  the  priority  assigned  to  element  A[i],  where  the  element 
with  the  j  th  smallest  priority  has  rank  j .  If  we  define  E,  as  the  event  in  which 
element  A[i]  receives  the  <r(/)th  smallest  priority,  or  rt  =  a(z),  the  same  proof 
still  applies.  Therefore,  if  we  calculate  the  probability  of  obtaining  any  particular 
permutation,  the  calculation  is  identical  to  the  one  above,  so  that  the  probability  of 
obtaining  this  permutation  is  also  \/n !.  ■ 


You  might  think  that  to  prove  that  a  permutation  is  a  uniform  random  permuta¬ 
tion,  it  suffices  to  show  that,  for  each  element  A  [i  ] ,  the  probability  that  the  element 
winds  up  in  position  j  is  \/n.  Exercise  5.3-4  shows  that  this  weaker  condition  is, 
in  fact,  insufficient. 

A  better  method  for  generating  a  random  permutation  is  to  permute  the  given 
array  in  place.  The  procedure  Randomize-In-Place  does  so  in  O(n)  time.  In 
its  /  th  iteration,  it  chooses  the  element  A  [i  ]  randomly  from  among  elements  A  [i  ] 
through  A[n],  Subsequent  to  the  /th  iteration,  A[i\  is  never  altered. 


Randomize-In-Place  (A) 

1  n  =  A.  length 

2  for  /  =  1  to  n 

3  swap  A[i]  with  A  [Random (/,  n)} 

We  shall  use  a  loop  invariant  to  show  that  procedure  Randomize-In-Place 
produces  a  uniform  random  permutation.  A  k  -permutation  on  a  set  of  n  ele¬ 
ments  is  a  sequence  containing  k  of  the  n  elements,  with  no  repetitions.  (See 
Appendix  C.)  There  are  n\/(n  —  k)  \  such  possible  /: -permutations. 
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Lemma  5.5 

Procedure  Randomize-In-Place  computes  a  uniform  random  permutation. 

Proof  We  use  the  following  loop  invariant: 

Just  prior  to  the  z'th  iteration  of  the  for  loop  of  lines  2-3,  for  each  possible 
(i  —  l)-permutation  of  the  n  elements,  the  subarray  A[\  .  .i  —  1]  contains 
this  (/  —  1) -permutation  with  probability  (zz  —  i  +  l)!/zz !. 

We  need  to  show  that  this  invariant  is  true  prior  to  the  first  loop  iteration,  that  each 
iteration  of  the  loop  maintains  the  invariant,  and  that  the  invariant  provides  a  useful 
property  to  show  correctness  when  the  loop  terminates. 

Initialization:  Consider  the  situation  just  before  the  first  loop  iteration,  so  that 
i  =  1 .  The  loop  invariant  says  that  for  each  possible  O-permutation,  the  sub¬ 
array  A[l  .  .0]  contains  this  0-permutation  with  probability  (zz  —  i  +  l)!/zz!  = 
zz!/zz!  =  1.  The  subarray  A [1  .  .0]  is  an  empty  subarray,  and  a  0-permutation 
has  no  elements.  Thus,  ^4  [  1  . .  0]  contains  any  0-permutation  with  probability  1, 
and  the  loop  invariant  holds  prior  to  the  first  iteration. 

Maintenance:  We  assume  that  just  before  the  z'th  iteration,  each  possible 
(z  —  1) -permutation  appears  in  the  subarray  A[l..i  —  1]  with  probability 
(/?  —  i  +  1 ) ! / Z7 ! ,  and  we  shall  show  that  after  the  z'th  iteration,  each  possible 
z'-permutation  appears  in  the  subarray  A[\  . ./]  with  probability  (zz  —  z')!/zz!. 
Incrementing  z  for  the  next  iteration  then  maintains  the  loop  invariant. 

Let  us  examine  the  z  th  iteration.  Consider  a  particular  z  -permutation,  and  de¬ 
note  the  elements  in  it  by  (xi,  x2,  .  .  . ,  x,).  This  permutation  consists  of  an 
(z  —  1) -permutation  {x\, ,  X;_i)  followed  by  the  value  x,  that  the  algorithm 
places  in  A[i}.  Let  Ex  denote  the  event  in  which  the  first  z  —  1  iterations  have 
created  the  particular  (z  —  1) -permutation  (xi,  . . . ,  x,_i)  in  A[\  . .  i  —  1].  By  the 
loop  invariant,  Pr  {Ei}  =  (zz  —  i  +  l)!/zz!.  Let  E2  be  the  event  that  z'th  iteration 
puts  X,  in  position  A[i],  The  z  -peimutation  (xi , . . . ,  x,- )  appears  in  A[l . .  z]  pre¬ 
cisely  when  both  Ex  and  E2  occur,  and  so  we  wish  to  compute  Prj^  H  E\}. 
Using  equation  (C.  14),  we  have 

Pr{£2  n  Ex)  =  Pr {E2  \  ^iPrl^}  . 

The  probability  Pr  {E2  \  E\\  equals  1  / (zz — z  + 1)  because  in  line  3  the  algorithm 
chooses  x,  randomly  from  the  zz  —  z  +  1  values  in  positions  A[i. .  zz  ] .  Thus,  we 
have 
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Vr{E2EEx}  =  Pr{£2  |  E^Pr^} 

1  (n  —  i  +  1)! 
n  —  i  +  1  n ! 

n\ 

Termination:  At  termination,  i  =  n  +  1,  and  we  have  that  the  subarray  A [  1  .  .n] 
is  a  given  » -permutation  with  probability  (n  —  (n  +  l)  +  l)/n!  =  0!/n !  =  l/n !. 

Thus,  Randomize-In-Place  produces  a  uniform  random  permutation.  ■ 

A  randomized  algorithm  is  often  the  simplest  and  most  efficient  way  to  solve  a 
problem.  We  shall  use  randomized  algorithms  occasionally  throughout  this  book. 

Exercises 


5.3-1 

Professor  Marceau  objects  to  the  loop  invariant  used  in  the  proof  of  Lemma  5.5.  He 
questions  whether  it  is  true  prior  to  the  first  iteration.  He  reasons  that  we  could  just 
as  easily  declare  that  an  empty  subarray  contains  no  O-permutations.  Therefore, 
the  probability  that  an  empty  subarray  contains  a  O-permutation  should  be  0,  thus 
invalidating  the  loop  invariant  prior  to  the  first  iteration.  Rewrite  the  procedure 
Randomize-In-Place  so  that  its  associated  loop  invariant  applies  to  a  nonempty 
subarray  prior  to  the  first  iteration,  and  modify  the  proof  of  Lemma  5.5  for  your 
procedure. 


5.3-2 

Professor  Kelp  decides  to  write  a  procedure  that  produces  at  random  any  permuta¬ 
tion  besides  the  identity  permutation.  He  proposes  the  following  procedure: 

Permute-Without-Identity  ( A ) 

1  n  =  A.  length 

2  for  i  =  1  to  n  —  1 

3  swap  A[i]  with  A  [Random (/  +  1 ,«)] 

Does  this  code  do  what  Professor  Kelp  intends? 


5.3-3 

Suppose  that  instead  of  swapping  element  A[i]  with  a  random  element  from  the 
subarray  A[i  . .  n\,  we  swapped  it  with  a  random  element  from  anywhere  in  the 
array: 
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Permute-With-All(A) 

1  n  =  A.  length 

2  for  i  =  1  to  n 

3  swap  A[i]  with  A[Random(1,  n)\ 

Does  this  code  produce  a  uniform  random  permutation?  Why  or  why  not? 

5.3- 4 

Professor  Armstrong  suggests  the  following  procedure  for  generating  a  uniform 
random  permutation: 

Permute-By-Cyclic(A) 

1  n  —  A.  length 

2  let  B[\  . .  n\  be  a  new  array 

3  offset  =  Random(1,«) 

4  for  i  =  1  to  n 

5  dest  =  i  +  offset 

6  if  dest  >  n 

7  dest  =  dest  —  n 

8  B[dest]  =  A[i] 

9  return  B 

Show  that  each  element  A  [i  ]  has  a  1  /n  probability  of  winding  up  in  any  particular 
position  in  B.  Then  show  that  Professor  Armstrong  is  mistaken  by  showing  that 
the  resulting  permutation  is  not  uniformly  random. 

5.3- 5  * 

Prove  that  in  the  array  P  in  procedure  Permute-By-Sorting,  the  probability 
that  all  elements  are  unique  is  at  least  1  —  \/n. 

5.3- 6 

Explain  how  to  implement  the  algorithm  Permute-By-Sorting  to  handle  the 
case  in  which  two  or  more  priorities  are  identical.  That  is,  your  algorithm  should 
produce  a  uniform  random  permutation,  even  if  two  or  more  priorities  are  identical. 


5.3-7 

Suppose  we  want  to  create  a  random  sample  of  the  set  {1, 2,  3, . . . ,/?},  that  is, 
an  m -element  subset  S,  where  0  <  m  <  n,  such  that  each  m -subset  is  equally 
likely  to  be  created.  One  way  would  be  to  set  A [z]  =  i  for  i  =  1, 2,  3 
call  Randomize-In-Place(A),  and  then  take  just  the  first  m  array  elements. 
This  method  would  make  n  calls  to  the  Random  procedure.  If  n  is  much  larger 
than  m,  we  can  create  a  random  sample  with  fewer  calls  to  Random.  Show  that 
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the  following  recursive  procedure  returns  a  random  m  -subset  S  of  {1,2, 3 
in  which  each  m-subset  is  equally  likely,  while  making  only  m  calls  to  Random: 

Random-Sample(/w,  n) 

1  if  m  ==  0 

2  return  0 

3  else  S  =  Random-Sample(/«  -  1 ,  n  -  1) 

4  i  =  Random(1,») 

5  if  i  e  5 

6  S  =  S  U  {77} 

7  else  S  =  S  U  {/} 

8  return  S 


★  5.4  Probabilistic  analysis  and  further  uses  of  indicator  random  variables 

This  advanced  section  further  illustrates  probabilistic  analysis  by  way  of  four  ex¬ 
amples.  The  first  determines  the  probability  that  in  a  room  of  k  people,  two  of 
them  share  the  same  birthday.  The  second  example  examines  what  happens  when 
we  randomly  toss  balls  into  bins.  The  third  investigates  “streaks”  of  consecutive 
heads  when  we  flip  coins.  The  final  example  analyzes  a  variant  of  the  hiring  prob¬ 
lem  in  which  you  have  to  make  decisions  without  actually  interviewing  all  the 
candidates. 

5.4.1  The  birthday  paradox 

Our  first  example  is  the  birthday  paradox.  How  many  people  must  there  be  in  a 
room  before  there  is  a  50%  chance  that  two  of  them  were  born  on  the  same  day  of 
the  year?  The  answer  is  surprisingly  few.  The  paradox  is  that  it  is  in  fact  far  fewer 
than  the  number  of  days  in  a  year,  or  even  half  the  number  of  days  in  a  year,  as  we 
shall  see. 

To  answer  this  question,  we  index  the  people  in  the  room  with  the  integers 
1,2, ...  ,k,  where  k  is  the  number  of  people  in  the  room.  We  ignore  the  issue 
of  leap  years  and  assume  that  all  years  have  n  =  365  days.  For  i  =  1,2 , ,k, 
let  b,  be  the  day  of  the  year  on  which  person  V s  birthday  falls,  where  1  <  b-,  <  n. 
We  also  assume  that  birthdays  are  uniformly  distributed  across  the  n  days  of  the 
year,  so  that  Pr{6,  =  r)  =  \/n  for  i  =  1,2, ...  ,k  and  r  —  1,2, ...  ,n. 

The  probability  that  two  given  people,  say  i  and  j ,  have  matching  birthdays 
depends  on  whether  the  random  selection  of  birthdays  is  independent.  We  assume 
from  now  on  that  birthdays  are  independent,  so  that  the  probability  that  i ’s  birthday 
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and  j ’s  birthday  both  fall  on  day  r  is 


Pr {bi  =  r  and  bj  =  r)  =  Pr  {bj  =  r}Pr{bj  =  r} 

=  1  /n2  . 

Thus,  the  probability  that  they  both  fall  on  the  same  day  is 


n 


Pr  {bi  —  bj}  =  ^  Pr  {bj  =  r  and  bj  —  r } 


r=  1 
n 


= 


=  \/n  . 


(5.6) 


More  intuitively,  once  bj  is  chosen,  the  probability  that  bj  is  chosen  to  be  the  same 
day  is  1  /n.  Thus,  the  probability  that  i  and  j  have  the  same  birthday  is  the  same 
as  the  probability  that  the  birthday  of  one  of  them  falls  on  a  given  day.  Notice, 
however,  that  this  coincidence  depends  on  the  assumption  that  the  birthdays  are 
independent. 

We  can  analyze  the  probability  of  at  least  2  out  of  k  people  having  matching 
birthdays  by  looking  at  the  complementary  event.  The  probability  that  at  least  two 
of  the  birthdays  match  is  1  minus  the  probability  that  all  the  birthdays  are  different. 
The  event  that  k  people  have  distinct  birthdays  is 


k 


Bk  =  f>  , 


where  Af  is  the  event  that  person  i’s  birthday  is  different  from  person  j’s  for 
all  j  <  i.  Since  we  can  write  Bk  =  Ak  D  Bk- 1,  we  obtain  from  equation  (C.16) 
the  recurrence 


Pr{5fc}  =  Pr{Bk_1}PT{Ak  \  Bk-i}  , 


(5.7) 


where  we  take  Pr{5x}  =  Pr  j  /f, )  =  1  as  an  initial  condition.  In  other  words, 

the  probability  that  b\,  b2 . bk  are  distinct  birthdays  is  the  probability  that 

bi,b2 . bk—  i  are  distinct  birthdays  times  the  probability  that  bk  ^  bj  for 

i  =  1, 2, . . . ,  k  ■»  1,  given  that  b\,  b2, . . . ,  bk- 1  are  distinct. 

If  b\ ,  b2 . bk- 1  are  distinct,  the  conditional  probability  that  bk  ^  bj  for 

/  =  1, 2, . . . ,  k  —  1  is  Pr  {Ak  \  Bk-i}  =  (n  —  k  +  1 )/«,  since  out  of  the  n  days, 
n  —  (k  —  1)  days  are  not  taken.  We  iteratively  apply  the  recurrence  (5.7)  to  obtain 
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Pr {Bk}  =  Vr{Bk^}Vr{Ak  \  Bk_x} 

=  Pr  {Bk_2}  Pr{^4*-i  I  Bk-2}  Pr  {Ak  \  Bk_i} 


=  Pr {$1}  Pr {A2  I  B1}Pt{A3  \  B2}---Y>r{Ak  \  Bk _,} 

=  'H)K)  "i1 

Inequality  (3.12),  1  +  x  <  ex,  gives  us 
Pr  {Bk}  <  e~ 


k-  1 


- !/«g-2/«  _  .  .  e~{k-l)/n 


=  e-Ef=i \Un 

_  g—k(k—l)/2n 


<  1/2 


when  —  k(k  —  l)/2 n  <  ln(l/2).  The  probability  that  all  k  birthdays  are  distinct 
is  at  most  1/2  when  k(k  —  1)  >  2n  In  2  or,  solving  the  quadratic  equation,  when 
k  >  (1  +  sjl  +  (8  In  2)n)/2.  For  n  =  365,  we  must  have  k  >  23.  Thus,  if  at 
least  23  people  are  in  a  room,  the  probability  is  at  least  1/2  that  at  least  two  people 
have  the  same  birthday.  On  Mars,  a  year  is  669  Martian  days  long;  it  therefore 
takes  3 1  Martians  to  get  the  same  effect. 


An  analysis  using  indicator  random  variables 

We  can  use  indicator  random  variables  to  provide  a  simpler  but  approximate  anal¬ 
ysis  of  the  birthday  paradox.  For  each  pair  (i,  j)  of  the  k  people  in  the  room,  we 
define  the  indicator  random  variable  Xtj,  for  1  <  i  <  j  <  k,  by 

Xjj  =  I  {person  i  and  person  j  have  the  same  birthday} 

1  if  person  i  and  person  j  have  the  same  birthday  , 

0  otherwise  . 

By  equation  (5.6),  the  probability  that  two  people  have  matching  birthdays  is  \/n, 
and  thus  by  Lemma  5. 1 ,  we  have 

E  [A;;]  =  Pr  {person  i  and  person  j  have  the  same  birthday} 

=  l/n  . 

Letting  X  be  the  random  variable  that  counts  the  number  of  pairs  of  individuals 
having  the  same  birthday,  we  have 
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k  k 

*  =  E  E  x‘i  ■ 

1=1 7=i+l 


Taking  expectations  of  both  sides  and  applying  linearity  of  expectation,  we  obtain 


Effl 


EE1 


ij 


_/  =  1  7=i  +  l 
k  k 


E  E  E[*«] 


k(k  -  1) 
2n 


When  k(k  —  1)  >  2 n,  therefore,  the  expected  number  of  pairs  of  people  with  the 
same  birthday  is  at  least  1.  Thus,  if  we  have  at  least  -Jin  +  1  individuals  in  a  room, 
we  can  expect  at  least  two  to  have  the  same  birthday.  For  n  =  365,  if  k  =  28,  the 
expected  number  of  pairs  with  the  same  birthday  is  (28  •  27)/ (2  ■  365)  ss  1.0356. 
Thus,  with  at  least  28  people,  we  expect  to  find  at  least  one  matching  pair  of  birth¬ 
days.  On  Mars,  where  a  year  is  669  Martian  days  long,  we  need  at  least  38  Mar¬ 
tians. 

The  first  analysis,  which  used  only  probabilities,  determined  the  number  of  peo¬ 
ple  required  for  the  probability  to  exceed  1/2  that  a  matching  pair  of  birthdays 
exists,  and  the  second  analysis,  which  used  indicator  random  variables,  determined 
the  number  such  that  the  expected  number  of  matching  birthdays  is  1.  Although 
the  exact  numbers  of  people  differ  for  the  two  situations,  they  are  the  same  asymp¬ 
totically:  &(Jn). 


5.4.2  Balls  and  bins 

Consider  a  process  in  which  we  randomly  toss  identical  balls  into  b  bins,  numbered 
1, 2, . . . ,  b.  The  tosses  are  independent,  and  on  each  toss  the  ball  is  equally  likely 
to  end  up  in  any  bin.  The  probability  that  a  tossed  ball  lands  in  any  given  bin  is  1  /b. 
Thus,  the  ball-tossing  process  is  a  sequence  of  Bernoulli  trials  (see  Appendix  C.4) 
with  a  probability  l/b  of  success,  where  success  means  that  the  ball  falls  in  the 
given  bin.  This  model  is  particularly  useful  for  analyzing  hashing  (see  Chapter  11), 
and  we  can  answer  a  variety  of  interesting  questions  about  the  ball-tossing  process. 
(Problem  C-l  asks  additional  questions  about  balls  and  bins.) 
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How  many  balls  fall  in  a  given  bin?  The  number  of  balls  that  fall  in  a  given  bin 
follows  the  binomial  distribution  b(k;n ,  1  /b).  If  we  toss  n  balls,  equation  (C.37) 
tells  us  that  the  expected  number  of  balls  that  fall  in  the  given  bin  is  n/b. 

How  many  balls  must  we  toss,  on  the  average,  until  a  given  bin  contains  a  ball? 
The  number  of  tosses  until  the  given  bin  receives  a  ball  follows  the  geometric 
distribution  with  probability  1  /b  and,  by  equation  (C.32),  the  expected  number  of 
tosses  until  success  is  1/(1  /b)  =  b. 

How  many  balls  must  we  toss  until  every  bin  contains  at  least  one  ball?  Let  us 
call  a  toss  in  which  a  ball  falls  into  an  empty  bin  a  “hit.”  We  want  to  know  the 
expected  number  n  of  tosses  required  to  get  b  hits. 

Using  the  hits,  we  can  partition  the  n  tosses  into  stages.  The  ith  stage  consists  of 
the  tosses  after  the  (i  —  l)st  hit  until  the  ith  hit.  The  first  stage  consists  of  the  first 
toss,  since  we  are  guaranteed  to  have  a  hit  when  all  bins  are  empty.  For  each  toss 
during  the  ith  stage,  i  —  1  bins  contain  balls  and  b  —  i  +  1  bins  are  empty.  Thus, 
for  each  toss  in  the  ith  stage,  the  probability  of  obtaining  a  hit  is  (b  —  i  +  1  )/b. 

Let  / ij  denote  the  number  of  tosses  in  the  ith  stage.  Thus,  the  number  of  tosses 
required  to  get  b  hits  is  n  —  i  n<-  Each  random  variable  «,■  has  a  geometric 
distribution  with  probability  of  success  (Jb  —  i  +  \)/b  and  thus,  by  equation  (C.32), 
we  have 


By  linearity  of  expectation,  we  have 


E  [/?]  =  E  Ynj 


_i  =  1 
b 


=  £[/!,■] 


=  b(\nb  +  0(1))  (by  equation  (A.7))  . 

It  therefore  takes  approximately  b  In  b  tosses  before  we  can  expect  that  every  bin 
has  a  ball.  This  problem  is  also  known  as  the  coupon  collector’s  problem ,  which 
says  that  a  person  trying  to  collect  each  of  b  different  coupons  expects  to  acquire 
approximately  b  In  b  randomly  obtained  coupons  in  order  to  succeed. 
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5.4.3  Streaks 


Suppose  you  flip  a  fair  coin  n  times.  What  is  the  longest  streak  of  consecutive 
heads  that  you  expect  to  see?  The  answer  is  0 ( 1  g ri),  as  the  following  analysis 
shows. 

We  first  prove  that  the  expected  length  of  the  longest  streak  of  heads  is  0( lg  n). 
The  probability  that  each  coin  flip  is  a  head  is  1/2.  Let  A,k  be  the  event  that  a 
streak  of  heads  of  length  at  least  k  begins  with  the  i  th  coin  flip  or,  more  precisely, 
the  event  that  the  k  consecutive  coin  flips  i,  i  +  1, . . . ,  i  +  k  —  1  yield  only  heads, 
where  1  <  k  <  n  and  1  <  i  <  n  —k  +  1.  Since  coin  flips  are  mutually  independent, 
for  any  given  event  the  probability  that  all  k  flips  are  heads  is 

Pr {Aik}  —  1/2*  .  (5.8) 


For  k  =  2  fig  /7 1 , 

Pr{^,2rig„i}  =  l/22rignl 
<  l/22Ig" 

=  1  In2, 


and  thus  the  probability  that  a  streak  of  heads  of  length  at  least  2  [lg  n  \  begins  in 
position  i  is  quite  small.  There  are  at  most  n  —  2  [lgn]  +  1  positions  where  such 
a  streak  can  begin.  The  probability  that  a  streak  of  heads  of  length  at  least  2  [lg  ri\ 
begins  anywhere  is  therefore 


In—2\lgn~\  +  l 

U 


< 


n-2|-lgnl  +  l 

E 


i  =  1 


1  /n2 


<  E1/*2 

i=i 

=  \jn  , 


(5.9) 


since  by  Boole’s  inequality  (C.19),  the  probability  of  a  union  of  events  is  at  most 
the  sum  of  the  probabilities  of  the  individual  events.  (Note  that  Boole’s  inequality 
holds  even  for  events  such  as  these  that  are  not  independent.) 

We  now  use  inequality  (5.9)  to  bound  the  length  of  the  longest  streak.  For 
j  =0,1,2,...,/?,  let  Lj  be  the  event  that  the  longest  streak  of  heads  has  length  ex¬ 
actly  j ,  and  let  L  be  the  length  of  the  longest  streak.  By  the  definition  of  expected 
value,  we  have 


E[L]  =  £yPr{L,-}  . 

7=0 


(5.10) 
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We  could  try  to  evaluate  this  sum  using  upper  bounds  on  each  Pr{L;}  similar  to 
those  computed  in  inequality  (5.9).  Unfortunately,  this  method  would  yield  weak 
bounds.  We  can  use  some  intuition  gained  by  the  above  analysis  to  obtain  a  good 
bound,  however.  Informally,  we  observe  that  for  no  individual  term  in  the  sum¬ 
mation  in  equation  (5.10)  are  both  the  factors  j  and  Pr  { Lj }  large.  Why?  When 
j  >  2  fig  n\ ,  then  Pr  {Lj  j  is  very  small,  and  when  j  <2  fig  n\ ,  then  j  is  fairly 
small.  More  formally,  we  note  that  the  events  Lj  for  j  =  0, 1, . . .  ,n  are  disjoint, 
and  so  the  probability  that  a  streak  of  heads  of  length  at  least  2  [lgn]  begins  any¬ 
where  is  Pr  {Lj}.  By  inequality  (5.9),  we  have  E”=2ngnl  Pr!L/i  <  l/n- 

Also,  noting  that  £]"=0Pr{Ly}  =  1,  we  have  that  *  Pr { Lj }  <  1.  Thus, 

we  obtain 


E  [L]  = 


< 


< 


E./Pr{E;} 

7=0 

2rig«l-l  n 

J2  j  Pr !  lj  » +  Y2  j Pr  •  Li » 

7=0  j=2\\gn\ 

2flgnl-l  n 

(2  rig«l)Pr{M+  nPriLj} 

7=0  j=2\\gn~\ 

2  Tig  w  1  —  1  n 

2flg«l  E  Pr  {Lj}  +  n  ^  p  HLj} 

7=0  j=2\\gn\ 


2  fig  ri\  ■  1  +  77  •  (1/77) 
0(lg/3)  . 


The  probability  that  a  streak  of  heads  exceeds  r  [~lg  n]  flips  diminishes  quickly 
with  r.  For  r  >  1,  the  probability  that  a  streak  of  at  least  r  [Ig  n]  heads  stalls  in 
position  i  is 

Pr  {A>rig«l }  =  l/2rr,gnl 
<  1  /nr  . 

Thus,  the  probability  is  at  most  n/nr  =  \/nr~ 1  that  the  longest  streak  is  at 
least  r  fig  77] ,  or  equivalently,  the  probability  is  at  least  1  —  1/ nr~1  that  the  longest 
streak  has  length  less  than  r  [Ig  73] . 

As  an  example,  for  77  =  1000  coin  flips,  the  probability  of  having  a  streak  of  at 
least  2  rig  77]  =  20  heads  is  at  most  1  / 77  =  1/1000.  The  chance  of  having  a  streak 
longer  than  3  flg7?l  =  30  heads  is  at  most  1  / 77 2  =  1/1,000,000. 

We  now  prove  a  complementary  lower  bound:  the  expected  length  of  the  longest 
streak  of  heads  in  73  coin  flips  is  ^2(lg  77).  To  prove  this  bound,  we  look  for  streaks 
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of  length  s  by  partitioning  the  n  flips  into  approximately  n/s  groups  of  s  flips 
each.  If  we  choose  s  =  L(lg»)/2J,  we  can  show  that  it  is  likely  that  at  least  one 
of  these  groups  comes  up  all  heads,  and  hence  it  is  likely  that  the  longest  streak 
has  length  at  least  s  =  £2(lg  n).  We  then  show  that  the  longest  streak  has  expected 
length  Q(lgn). 

We  partition  the  n  coin  flips  into  at  least  [n /  |_(lgn)/2J J  groups  of  \_{\gn)/2\ 
consecutive  flips,  and  we  bound  the  probability  that  no  group  comes  up  all  heads. 
By  equation  (5.8),  the  probability  that  the  group  starting  in  position  i  comes  up  all 
heads  is 

Pr{^,L(lg«)/2j}  =  l/2L0g")/2J 

>  1/ sfn  . 

The  probability  that  a  streak  of  heads  of  length  at  least  |_(lg«)/2J  does  not  begin 
in  position  i  is  therefore  at  most  1  —  1  / *fn.  Since  the  [n/  |_(lg«)/2JJ  groups  are 
formed  from  mutually  exclusive,  independent  coin  flips,  the  probability  that  every 
one  of  these  groups  fails  to  be  a  streak  of  length  [(lg  ri)/2\  is  at  most 

(l  _  l/Vn)"/LOs")/2J_1 
(1  -  i/V«)2"/lE"_1 

g-(2n/lgn-l)/V» 

0(e~lsn) 

0{\/n)  . 

For  this  argument,  we  used  inequality  (3.12),  1  +  x  <  ex ,  and  the  fact,  which  you 
might  want  to  verify,  that  (2 n  /  lg  n  —  1 )/  *J7\  >  lg  n  for  sufficiently  large  n . 

Thus,  the  probability  that  the  longest  streak  exceeds  \  {\gn)/2\  is 

n 

Y,  V*{Lj}>\-0(\/n)  .  (5.11) 

y=L(lgn)/2j  +  l 

We  can  now  calculate  a  lower  bound  on  the  expected  length  of  the  longest  streak, 
beginning  with  equation  (5.10)  and  proceeding  in  a  manner  similar  to  our  analysis 
of  the  upper  bound: 


(1  -  1  /y/n) 


L«/L0g«)/2JJ 


< 

< 

< 
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E  [L]  = 


> 


> 


j= 0 

L(lgn)/2J  n 

E  J  Pr • Lj  > +  E  ./ Pr '  lj  i 

7=0  y  =  L(lg«)/2j  +  l 

L(lgn)/2J  n 

E  0  ■  Pr  {  Li !  +  E  L(lg«)/2JPr{L;} 

7=0  7'  =  L(lg«)/2j  +  l 

L(lgn)/2J  n 

0-  ^  Pr(L;|+L(lg»)/2j  ^  Pr  { Lj  } 

7=0  7  =  L(lgn)/2j  +  l 

0+  L(lg«)/2J  (1  —  0(1/ /?))  (by  inequality  (5.11)) 


As  with  the  birthday  paradox,  we  can  obtain  a  simpler  but  approximate  analysis 
using  indicator  random  variables.  We  let  Xjg  =  I  be  the  indicator  random 
variable  associated  with  a  streak  of  heads  of  length  at  least  k  beginning  with  the 
2  th  coin  flip.  To  count  the  total  number  of  such  streaks,  we  define 

n—k+1 

x=  e  ■ 

i  =  l 

Taking  expectations  and  using  linearity  of  expectation,  we  have 


E[X] 


'n—k+ 1 


E 


1  =  1 


n—k+ 1 

E 


i=i 


n—k+ 1 


E  Pr{^> 


n—k+ 1 

E  ‘/2l 

2  —  1 

«  —  k  +  1 

Yk 


By  plugging  in  various  values  for  k,  we  can  calculate  the  expected  number  of 
streaks  of  length  k.  If  this  number  is  large  (much  greater  than  1),  then  we  expect 
many  streaks  of  length  k  to  occur  and  the  probability  that  one  occurs  is  high.  If 
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this  number  is  small  (much  less  than  1),  then  we  expect  few  streaks  of  length  k  to 
occur  and  the  probability  that  one  occurs  is  low.  If  k  =  clgn,  for  some  positive 
constant  c,  we  obtain 


E[X] 


n  —  c  lg  n  +  1 

2^  ig« 

n  —  c  lg  n  +  1 
nc 

1  (c  lgn  —  1)//? 
nc~l  nc_1 

©(1  /nc~l)  . 


If  c  is  large,  the  expected  number  of  streaks  of  length  c  lg  n  is  small,  and  we  con¬ 
clude  that  they  are  unlikely  to  occur.  On  the  other  hand,  if  c  =  1/2,  then  we  obtain 
E  [X]  =  ©(l/n1/2-1)  =  ©(n1/2),  and  we  expect  that  there  are  a  large  number 
of  streaks  of  length  (1/2)  lgn.  Therefore,  one  streak  of  such  a  length  is  likely  to 
occur.  From  these  rough  estimates  alone,  we  can  conclude  that  the  expected  length 
of  the  longest  streak  is  ©(lgn). 


5.4.4  The  on-line  hiring  problem 

As  a  final  example,  we  consider  a  variant  of  the  hiring  problem.  Suppose  now  that 
we  do  not  wish  to  interview  all  the  candidates  in  order  to  find  the  best  one.  We 
also  do  not  wish  to  hire  and  fire  as  we  find  better  and  better  applicants.  Instead,  we 
are  willing  to  settle  for  a  candidate  who  is  close  to  the  best,  in  exchange  for  hiring 
exactly  once.  We  must  obey  one  company  requirement:  after  each  interview  we 
must  either  immediately  offer  the  position  to  the  applicant  or  immediately  reject  the 
applicant.  What  is  the  trade-off  between  minimizing  the  amount  of  interviewing 
and  maximizing  the  quality  of  the  candidate  hired? 

We  can  model  this  problem  in  the  following  way.  After  meeting  an  applicant, 
we  are  able  to  give  each  one  a  score;  let  score (i)  denote  the  score  we  give  to  the  /  th 
applicant,  and  assume  that  no  two  applicants  receive  the  same  score.  After  we  have 
seen  j  applicants,  we  know  which  of  the  j  has  the  highest  score,  but  we  do  not 
know  whether  any  of  the  remaining  n  —  j  applicants  will  receive  a  higher  score.  We 
decide  to  adopt  the  strategy  of  selecting  a  positive  integer  k  <  n,  interviewing  and 
then  rejecting  the  first  k  applicants,  and  hiring  the  first  applicant  thereafter  who  has 
a  higher  score  than  all  preceding  applicants.  If  it  turns  out  that  the  best-qualified 
applicant  was  among  the  first  k  interviewed,  then  we  hire  the  nth  applicant.  We 
formalize  this  strategy  in  the  procedure  On-Line-Maximum  (A:,  n),  which  returns 
the  index  of  the  candidate  we  wish  to  hire. 
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1 

2 

3 

4 

5 

6 
7 


On-Line-Maximum  (k,  n ) 
1  bestscore  =  —  oo 


2  for  /  =  1  to  k 

3  if  score  (i )  >  bestscore 

4  bestscore  —  score  (i) 


5  for  i  =  k  +  1  to  n 

6  if  score (i)  >  bestscore 


return  i 


8  return  n 

We  wish  to  determine,  for  each  possible  value  of  k,  the  probability  that  we 
hire  the  most  qualified  applicant.  We  then  choose  the  best  possible  k,  and 
implement  the  strategy  with  that  value.  For  the  moment,  assume  that  k  is 
fixed.  Let  M(j)  =  maxi<,<7  {scored )}  denote  the  maximum  score  among  ap¬ 
plicants  1  through  j .  Let  S  be  the  event  that  we  succeed  in  choosing  the  best- 
qualified  applicant,  and  let  5,  be  the  event  that  we  succeed  when  the  best-qualified 
applicant  is  the  /  th  one  interviewed.  Since  the  various  .S',  are  disjoint,  we  have 
that  Pr{S}  =  YHi=\  Pr  {5,  }.  Noting  that  we  never  succeed  when  the  best-qualified 
applicant  is  one  of  the  first  k,  we  have  that  Pr  { .S', }  =  0  for  i  =  1,2, ...  ,k.  Thus, 
we  obtain 


n 


(5.12) 


i=k+ 1 


We  now  compute  Pr  {.S',  }.  In  order  to  succeed  when  the  best-qualified  applicant 
is  the  i  th  one,  two  things  must  happen.  First,  the  best-qualified  applicant  must  be 
in  position  i ,  an  event  which  we  denote  by  Bt .  Second,  the  algorithm  must  not 
select  any  of  the  applicants  in  positions  k  +  1  through  i  —  1,  which  happens  only  if, 
for  each  j  such  that  k  +  1  <  j  <  i  —  1,  we  find  that  scored  j )  <  bestscore  in  line  6. 
(Because  scores  are  unique,  we  can  ignore  the  possibility  of  score(  j )  =  bestscore.) 
In  other  words,  all  of  the  values  score (k  +  1)  through  scored  —  1)  must  be  less 
than  M(k)\  if  any  are  greater  than  M(k),  we  instead  return  the  index  of  the  first 
one  that  is  greater.  We  use  0,  to  denote  the  event  that  none  of  the  applicants  in 
position  k  +  1  through  i  —  1  are  chosen.  Fortunately,  the  two  events  B,  and  0, 
are  independent.  The  event  O ,  depends  only  on  the  relative  ordering  of  the  values 
in  positions  1  through  i  —  1,  whereas  B,  depends  only  on  whether  the  value  in 
position  i  is  greater  than  the  values  in  all  other  positions.  The  ordering  of  the 
values  in  positions  1  through  i  —  1  does  not  affect  whether  the  value  in  position  i 
is  greater  than  all  of  them,  and  the  value  in  position  i  does  not  affect  the  ordering 
of  the  values  in  positions  1  through  i  —  1.  Thus  we  can  apply  equation  (C.15)  to 
obtain 
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Pr  { Sj  }  =  Pr  { B,  n  (),  }  =  Pr  {£, }  Pr  {  0, }  . 

The  probability  Pr{5,}  is  clearly  \ / n,  since  the  maximum  is  equally  likely  to 
be  in  any  one  of  the  n  positions.  For  event  Oj  to  occur,  the  maximum  value  in 
positions  1  through  i  —  1,  which  is  equally  likely  to  be  in  any  of  these  i  —  1  positions, 
must  be  in  one  of  the  first  k  positions.  Consequently,  Pr  { O,  j  =  k  j (i  —  1)  and 
Pr{5,}  =  k/(n(i  —  1)).  Using  equation  (5.12),  we  have 


Pr{5} 


£  Pr{5/} 

i=k+ 1 


E 

i=k+ 1 


k 

n(i  —  1) 


7  n 

k  V  A 

n  ^  i 

i=k+ 1 


l 


iyil 

n  ^  i 

i=k 


We  approximate  by  integrals  to  bound  this  summation  from  above  and  below.  By 
the  inequalities  (A.  12),  we  have 

[n  1  ,  ^1  r"-1  1  , 

Jk  X  l  Jk-l  X 

i=k 

Evaluating  these  definite  integrals  gives  us  the  bounds 

k  k 

—  (In/;  —  Ink)  <  Pr {51}  <  — (ln(n  —  1)  —  ln(k  —  1))  , 

n  n 

which  provide  a  rather  tight  bound  for  Pr  {51}.  Because  we  wish  to  maximize  our 
probability  of  success,  let  us  focus  on  choosing  the  value  of  k  that  maximizes  the 
lower  bound  on  Pr  {S}.  (Besides,  the  lower-bound  expression  is  easier  to  maximize 
than  the  upper-bound  expression.)  Differentiating  the  expression  (k/n)(hui— ink) 
with  respect  to  k,  we  obtain 

—  (In n  —  Ink  —  1)  . 

n 

Setting  this  derivative  equal  to  0,  we  see  that  we  maximize  the  lower  bound  on  the 
probability  when  Ink  =  Inn  —  1  =  In (n/e)  or,  equivalently,  when  k  =  n/e.  Thus, 
if  we  implement  our  strategy  with  k  =  n/e,  we  succeed  in  hiring  our  best-qualified 
applicant  with  probability  at  least  1/e. 
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Exercises 


5.4-1 

How  many  people  must  there  be  in  a  room  before  the  probability  that  someone 
has  the  same  birthday  as  you  do  is  at  least  1/2?  How  many  people  must  there  be 
before  the  probability  that  at  least  two  people  have  a  birthday  on  July  4  is  greater 
than  1  /2? 


5.4- 2 

Suppose  that  we  toss  balls  into  b  bins  until  some  bin  contains  two  balls.  Each  toss 
is  independent,  and  each  ball  is  equally  likely  to  end  up  in  any  bin.  What  is  the 
expected  number  of  ball  tosses? 

5.4- 3  * 

For  the  analysis  of  the  birthday  paradox,  is  it  important  that  the  birthdays  be  mutu¬ 
ally  independent,  or  is  pairwise  independence  sufficient?  Justify  your  answer. 

5.4- 4  * 

How  many  people  should  be  invited  to  a  party  in  order  to  make  it  likely  that  there 
are  three  people  with  the  same  birthday? 

5.4- 5  * 

What  is  the  probability  that  a  /r -string  over  a  set  of  size  n  forms  a  /.--permutation? 
How  does  this  question  relate  to  the  birthday  paradox? 

5.4- 6  * 

Suppose  that  n  balls  are  tossed  into  n  bins,  where  each  toss  is  independent  and  the 
ball  is  equally  likely  to  end  up  in  any  bin.  What  is  the  expected  number  of  empty 
bins?  What  is  the  expected  number  of  bins  with  exactly  one  ball? 

5.4- 7  * 

Sharpen  the  lower  bound  on  streak  length  by  showing  that  in  n  flips  of  a  fair  coin, 
the  probability  is  less  than  1//7  that  no  streak  longer  than  Ig  n—2  lg  lg  n  consecutive 
heads  occurs. 
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Problems 


5-1  Probabilistic  counting 

With  a  b -bit  counter,  we  can  ordinarily  only  count  up  to  2b  —  1.  With  R.  Morris’s 
probabilistic  counting ,  we  can  count  up  to  a  much  larger  value  at  the  expense  of 
some  loss  of  precision. 

We  let  a  counter  value  of  i  represent  a  count  of  n,  for  /  =0.1,...,  2*  —  1 ,  where 
the  Hi  form  an  increasing  sequence  of  nonnegative  values.  We  assume  that  the  ini¬ 
tial  value  of  the  counter  is  0,  representing  a  count  of  n0  =  0.  The  INCREMENT 
operation  works  on  a  counter  containing  the  value  i  in  a  probabilistic  manner.  If 
i  =  2b  —  1,  then  the  operation  reports  an  overflow  error.  Otherwise,  the  INCRE¬ 
MENT  operation  increases  the  counter  by  1  with  probability  l/(ni+1  —  «,•),  and  it 
leaves  the  counter  unchanged  with  probability  1  —  l/(n,-+1  — 

If  we  select  /?,  =  i  for  all  i  >  0,  then  the  counter  is  an  ordinary  one.  More 
interesting  situations  arise  if  we  select,  say,  /?,  =  2'-1  for  i  >  0  or  «,•  =  F,  (the 
z'th  Fibonacci  number— see  Section  3.2). 

For  this  problem,  assume  that  «2*-i  is  large  enough  that  the  probability  of  an 
overflow  error  is  negligible. 

a.  Show  that  the  expected  value  represented  by  the  counter  after  n  INCREMENT 
operations  have  been  performed  is  exactly  n. 

b.  The  analysis  of  the  variance  of  the  count  represented  by  the  counter  depends 
on  the  sequence  of  the  .  Let  us  consider  a  simple  case:  /?,  =  100/  for 
all  i  >  0.  Estimate  the  variance  in  the  value  represented  by  the  register  after  n 
Increment  operations  have  been  performed. 

5-2  Searching  an  unsorted  array 

This  problem  examines  three  algorithms  for  searching  for  a  value  x  in  an  unsorted 
array  A  consisting  of  n  elements. 

Consider  the  following  randomized  strategy:  pick  a  random  index  i  into  A.  If 
A[i]  =  x,  then  we  terminate;  otherwise,  we  continue  the  search  by  picking  a  new 
random  index  into  A.  We  continue  picking  random  indices  into  A  until  we  find  an 
index  j  such  that  A  [j ]  =  x  or  until  we  have  checked  every  element  of  A.  Note 
that  we  pick  from  the  whole  set  of  indices  each  time,  so  that  we  may  examine  a 
given  element  more  than  once. 

a.  Write  pseudocode  for  a  procedure  Random-Search  to  implement  the  strat¬ 
egy  above.  Be  sure  that  your  algorithm  terminates  when  all  indices  into  A  have 
been  picked. 
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b.  Suppose  that  there  is  exactly  one  index  i  such  that  A  [/']  =  x.  What  is  the 
expected  number  of  indices  into  A  that  we  must  pick  before  we  find  x  and 
Random-Search  terminates? 

c.  Generalizing  your  solution  to  part  (b),  suppose  that  there  are  k  >  1  indices  i 
such  that  A[i]  =  x.  What  is  the  expected  number  of  indices  into  A  that  we 
must  pick  before  we  find  x  and  Random-Search  terminates?  Your  answer 
should  be  a  function  of  n  and  k. 

d.  Suppose  that  there  are  no  indices  i  such  that  A[i]  =  x.  What  is  the  expected 
number  of  indices  into  A  that  we  must  pick  before  we  have  checked  all  elements 
of  A  and  Random-Search  terminates? 

Now  consider  a  deterministic  linear  search  algorithm,  which  we  refer  to  as 
Deterministic-Search.  Specifically,  the  algorithm  searches  A  for  x  in  order, 

considering  ,4  [I  ] ,  A[2],  A [3] . A[n]  until  either  it  finds  A[i\  =  x  or  it  reaches 

the  end  of  the  array.  Assume  that  all  possible  permutations  of  the  input  array  are 
equally  likely. 

e.  Suppose  that  there  is  exactly  one  index  i  such  that  A[i]  =  x.  What  is  the 
average-case  running  time  of  Deterministic-Search?  What  is  the  worst- 
case  running  time  of  Deterministic-Search? 

/.  Generalizing  your  solution  to  part  (e),  suppose  that  there  are  k  >  1  indices  i 
such  that  A[i]  —  x.  What  is  the  average-case  running  time  of  Deterministic- 
Search?  What  is  the  worst-case  running  time  of  Deterministic-Search? 
Your  answer  should  be  a  function  of  n  and  k. 

g.  Suppose  that  there  are  no  indices  i  such  that  A[i]  =  x.  What  is  the  average-case 
running  time  of  Deterministic-Search?  What  is  the  worst-case  running 
time  of  Deterministic-Search? 

Finally,  consider  a  randomized  algorithm  Scramble-Search  that  works  by 
first  randomly  permuting  the  input  array  and  then  running  the  deterministic  lin¬ 
eal-  search  given  above  on  the  resulting  permuted  array. 

h.  Letting  k  be  the  number  of  indices  i  such  that  A  [i  J  =  x,  give  the  worst-case  and 
expected  running  times  of  Scramble-Search  for  the  cases  in  which  k  =  0 
and  k  =  1 .  Generalize  your  solution  to  handle  the  case  in  which  k  >  1 . 

i.  Which  of  the  three  searching  algorithms  would  you  use?  Explain  your  answer. 
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Chapter  notes 

Bollobas  [53],  Hofri  [174],  and  Spencer  [321]  contain  a  wealth  of  advanced  prob¬ 
abilistic  techniques.  The  advantages  of  randomized  algorithms  are  discussed  and 
surveyed  by  Kaip  [200]  and  Rabin  [288].  The  textbook  by  Motwani  and  Raghavan 
[262]  gives  an  extensive  treatment  of  randomized  algorithms. 

Several  valiants  of  the  hiring  problem  have  been  widely  studied.  These  problems 
are  more  commonly  referred  to  as  “secretary  problems.”  An  example  of  work  in 
this  area  is  the  paper  by  Ajtai,  Meggido,  and  Waarts  [11]. 
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Introduction 


This  part  presents  several  algorithms  that  solve  the  following  sorting  problem : 
Input:  A  sequence  of  n  numbers  (cti,  a2, . .  . ,  an). 

Output:  A  permutation  (reordering)  (a[,  a'2,  . . . ,  a'n)  of  the  input  sequence  such 
that  a\  <  a'2  <  •  •  •  <  a'n. 

The  input  sequence  is  usually  an  77 -element  array,  although  it  may  be  represented 
in  some  other  fashion,  such  as  a  linked  list. 

The  structure  of  the  data 

In  practice,  the  numbers  to  be  sorted  are  rarely  isolated  values.  Each  is  usually  part 
of  a  collection  of  data  called  a  record.  Each  record  contains  a  key,  which  is  the 
value  to  be  sorted.  The  remainder  of  the  record  consists  of  satellite  data ,  which  are 
usually  carried  around  with  the  key.  In  practice,  when  a  sorting  algorithm  permutes 
the  keys,  it  must  permute  the  satellite  data  as  well.  If  each  record  includes  a  large 
amount  of  satellite  data,  we  often  permute  an  array  of  pointers  to  the  records  rather 
than  the  records  themselves  in  order  to  minimize  data  movement. 

In  a  sense,  it  is  these  implementation  details  that  distinguish  an  algorithm  from 
a  full-blown  program.  A  sorting  algorithm  describes  the  method  by  which  we 
determine  the  sorted  order,  regardless  of  whether  we  are  sorting  individual  numbers 
or  large  records  containing  many  bytes  of  satellite  data.  Thus,  when  focusing  on  the 
problem  of  sorting,  we  typically  assume  that  the  input  consists  only  of  numbers. 
Translating  an  algorithm  for  sorting  numbers  into  a  program  for  sorting  records 
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is  conceptually  straightforward,  although  in  a  given  engineering  situation  other 
subtleties  may  make  the  actual  programming  task  a  challenge. 

Why  sorting? 

Many  computer  scientists  consider  sorting  to  be  the  most  fundamental  problem  in 
the  study  of  algorithms.  There  are  several  reasons: 

•  Sometimes  an  application  inherently  needs  to  sort  information.  For  example, 
in  order  to  prepare  customer  statements,  banks  need  to  sort  checks  by  check 
number. 

•  Algorithms  often  use  sorting  as  a  key  subroutine.  For  example,  a  program  that 
renders  graphical  objects  which  are  layered  on  top  of  each  other  might  have 
to  sort  the  objects  according  to  an  “above”  relation  so  that  it  can  draw  these 
objects  from  bottom  to  top.  We  shall  see  numerous  algorithms  in  this  text  that 
use  sorting  as  a  subroutine. 

•  We  can  draw  from  among  a  wide  variety  of  sorting  algorithms,  and  they  em¬ 
ploy  a  rich  set  of  techniques.  In  fact,  many  important  techniques  used  through¬ 
out  algorithm  design  appeal-  in  the  body  of  sorting  algorithms  that  have  been 
developed  over  the  years.  In  this  way,  sorting  is  also  a  problem  of  historical 
interest. 

•  We  can  prove  a  nontrivial  lower  bound  for  sorting  (as  we  shall  do  in  Chapter  8). 
Our  best  upper  bounds  match  the  lower  bound  asymptotically,  and  so  we  know 
that  our  sorting  algorithms  are  asymptotically  optimal.  Moreover,  we  can  use 
the  lower  bound  for  sorting  to  prove  lower  bounds  for  certain  other  problems. 

•  Many  engineering  issues  come  to  the  fore  when  implementing  sorting  algo¬ 
rithms.  The  fastest  sorting  program  for  a  particular  situation  may  depend  on 
many  factors,  such  as  prior  knowledge  about  the  keys  and  satellite  data,  the 
memory  hierarchy  (caches  and  virtual  memory)  of  the  host  computer,  and  the 
software  environment.  Many  of  these  issues  are  best  dealt  with  at  the  algorith¬ 
mic  level,  rather  than  by  “tweaking”  the  code. 

Sorting  algorithms 

We  introduced  two  algorithms  that  sort  n  real  numbers  in  Chapter  2.  Insertion  sort 
takes  &(n2)  time  in  the  worst  case.  Because  its  inner  loops  are  tight,  however, 
it  is  a  fast  in-place  sorting  algorithm  for  small  input  sizes.  (Recall  that  a  sorting 
algorithm  sorts  in  place  if  only  a  constant  number  of  elements  of  the  input  ar¬ 
ray  are  ever  stored  outside  the  array.)  Merge  sort  has  a  better  asymptotic  running 
time,  0(/7  lg  n ),  but  the  Merge  procedure  it  uses  does  not  operate  in  place. 
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In  this  part,  we  shall  introduce  two  more  algorithms  that  sort  arbitrary  real  num¬ 
bers.  Heapsort,  presented  in  Chapter  6,  sorts  n  numbers  in  place  in  0(n  lg  /;)  time. 
It  uses  an  important  data  structure,  called  a  heap,  with  which  we  can  also  imple¬ 
ment  a  priority  queue. 

Quicksort,  in  Chapter  7,  also  sorts  n  numbers  in  place,  but  its  worst-case  running 
time  is  &(n2).  Its  expected  running  time  is  0(«  lg  ri),  however,  and  it  generally 
outperforms  heapsort  in  practice.  Like  insertion  sort,  quicksort  has  tight  code,  and 
so  the  hidden  constant  factor  in  its  running  time  is  small.  It  is  a  popular  algorithm 
for  sorting  large  input  arrays. 

Insertion  sort,  merge  sort,  heapsort,  and  quicksort  are  all  comparison  sorts:  they 
determine  the  sorted  order  of  an  input  array  by  comparing  elements.  Chapter  8  be¬ 
gins  by  introducing  the  decision-tree  model  in  order  to  study  the  performance  limi¬ 
tations  of  comparison  sorts.  Using  this  model,  we  prove  a  lower  bound  of  £l(n  lg  n) 
on  the  worst-case  running  time  of  any  comparison  sort  on  n  inputs,  thus  showing 
that  heapsort  and  merge  sort  are  asymptotically  optimal  comparison  sorts. 

Chapter  8  then  goes  on  to  show  that  we  can  beat  this  lower  bound  of  £l(n  lg  n) 
if  we  can  gather  information  about  the  sorted  order  of  the  input  by  means  other 
than  comparing  elements.  The  counting  sort  algorithm,  for  example,  assumes  that 
the  input  numbers  are  in  the  set  {0, 1, . . .  ,k}.  By  using  array  indexing  as  a  tool 
for  determining  relative  order,  counting  sort  can  sort  n  numbers  in  &(k  +  n)  time. 
Thus,  when  k  =  0(n),  counting  sort  runs  in  time  that  is  linear-  in  the  size  of  the 
input  array.  A  related  algorithm,  radix  sort,  can  be  used  to  extend  the  range  of 
counting  sort.  If  there  are  n  integers  to  sort,  each  integer  has  d  digits,  and  each 
digit  can  take  on  up  to  k  possible  values,  then  radix  sort  can  sort  the  numbers 
in  ®(d(n  +  k))  time.  When  d  is  a  constant  and  k  is  O(n),  radix  sort  runs  in 
linear  time.  A  third  algorithm,  bucket  sort,  requires  knowledge  of  the  probabilistic 
distribution  of  numbers  in  the  input  array.  It  can  sort  n  real  numbers  uniformly 
distributed  in  the  half-open  interval  [0,  1)  in  average-case  0(ri)  time. 

The  following  table  summarizes  the  running  times  of  the  sorting  algorithms  from 
Chapters  2  and  6-8.  As  usual,  n  denotes  the  number  of  items  to  sort.  For  counting 
sort,  the  items  to  sort  are  integers  in  the  set  {0, 1, . . . ,  k}.  For  radix  sort,  each  item 
is  a  d -digit  number,  where  each  digit  takes  on  k  possible  values.  For  bucket  sort, 
we  assume  that  the  keys  are  real  numbers  uniformly  distributed  in  the  half-open 
interval  [0,  1).  The  rightmost  column  gives  the  average-case  or  expected  running 
time,  indicating  which  it  gives  when  it  differs  from  the  worst-case  running  time. 
We  omit  the  average-case  running  time  of  heapsort  because  we  do  not  analyze  it  in 
this  book. 
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Algorithm 

Worst-case 
running  time 

Average-case/expected 
running  time 

Insertion  sort 

0(«2) 

0(/;2) 

Merge  sort 

0(n  lg  n) 

0(/;  lg/;) 

Heapsort 

0(n  lg  n) 

— 

Quicksort 

0  (n  2 ) 

0(/;lg/;)  (expected) 

Counting  sort 

®(k  +  /;) 

0(k  +  n) 

Radix  sort 

&(d(n  +  k )) 

®(d(n  +  k )) 

Bucket  sort 

0(«2) 

0(/;)  (average-case) 

Order  statistics 

The  /  th  order  statistic  of  a  set  of  n  numbers  is  the  /  th  smallest  number  in  the  set. 
We  can,  of  course,  select  the  /  th  order  statistic  by  sorting  the  input  and  indexing 
the  /th  element  of  the  output.  With  no  assumptions  about  the  input  distribution, 
this  method  runs  in  Q(n  lg  n)  time,  as  the  lower  bound  proved  in  Chapter  8  shows. 

In  Chapter  9,  we  show  that  we  can  find  the  /th  smallest  element  in  0(n )  time, 
even  when  the  elements  are  arbitrary  real  numbers.  We  present  a  randomized  algo¬ 
rithm  with  tight  pseudocode  that  runs  in  <d(n2)  time  in  the  worst  case,  but  whose 
expected  running  time  is  O(n).  We  also  give  a  more  complicated  algorithm  that 
runs  in  0(n)  worst-case  time. 

Background 

Although  most  of  this  part  does  not  rely  on  difficult  mathematics,  some  sections 
do  require  mathematical  sophistication.  In  particular-,  analyses  of  quicksort,  bucket 
sort,  and  the  order-statistic  algorithm  use  probability,  which  is  reviewed  in  Ap¬ 
pendix  C,  and  the  material  on  probabilistic  analysis  and  randomized  algorithms  in 
Chapter  5.  The  analysis  of  the  worst-case  linear-time  algorithm  for  order  statis¬ 
tics  involves  somewhat  more  sophisticated  mathematics  than  the  other  worst-case 
analyses  in  this  part. 
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In  this  chapter,  we  introduce  another  sorting  algorithm:  heapsort.  Like  merge  sort, 
but  unlike  insertion  sort,  heapsort’s  running  time  is  0(n  Ig  n).  Like  insertion  sort, 
but  unlike  merge  sort,  heapsort  sorts  in  place:  only  a  constant  number  of  array 
elements  are  stored  outside  the  input  array  at  any  time.  Thus,  heapsort  combines 
the  better  attributes  of  the  two  sorting  algorithms  we  have  already  discussed. 

Heapsort  also  introduces  another  algorithm  design  technique:  using  a  data  struc¬ 
ture,  in  this  case  one  we  call  a  “heap,”  to  manage  information.  Not  only  is  the  heap 
data  structure  useful  for  heapsort,  but  it  also  makes  an  efficient  priority  queue.  The 
heap  data  structure  will  reappear  in  algorithms  in  later  chapters. 

The  term  “heap”  was  originally  coined  in  the  context  of  heapsort,  but  it  has  since 
come  to  refer  to  “garbage-collected  storage,”  such  as  the  programming  languages 
Java  and  Lisp  provide.  Our  heap  data  structure  is  not  garbage-collected  storage, 
and  whenever  we  refer  to  heaps  in  this  book,  we  shall  mean  a  data  structure  rather 
than  an  aspect  of  garbage  collection. 
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The  (binary)  heap  data  structure  is  an  array  object  that  we  can  view  as  a 
nearly  complete  binary  tree  (see  Section  B.5.3),  as  shown  in  Figure  6.1.  Each 
node  of  the  tree  corresponds  to  an  element  of  the  array.  The  tree  is  com¬ 
pletely  filled  on  all  levels  except  possibly  the  lowest,  which  is  filled  from  the 
left  up  to  a  point.  An  array  A  that  represents  a  heap  is  an  object  with  two  at¬ 
tributes:  A.  length,  which  (as  usual)  gives  the  number  of  elements  in  the  array,  and 
A. heap-size,  which  represents  how  many  elements  in  the  heap  are  stored  within 
array  A.  That  is,  although  A[  1 . .  A. length]  may  contain  numbers,  only  the  ele¬ 
ments  in  .4 [I  . .  A. heap-size],  where  0  <  A. heap-size  <  A. length,  are  valid  ele¬ 
ments  of  the  heap.  The  root  of  the  tree  is  A[l],  and  given  the  index  i  of  a  node,  we 
can  easily  compute  the  indices  of  its  parent,  left  child,  and  right  child: 
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Figure  6.1  A  max  heap  viewed  as  (a)  a  binary  tree  and  (b)  an  array.  The  number  within  the  circle 
at  each  node  in  the  tree  is  the  value  stored  at  that  node.  The  number  above  a  node  is  the  corresponding 
index  in  the  array.  Above  and  below  the  array  are  lines  showing  parent  child  relationships;  parents 
are  always  to  the  left  of  their  children.  The  tree  has  height  three;  the  node  at  index  4  (with  value  8) 
has  height  one. 


Parent(/) 

1  return  / 2J 

Left (/) 

1  return  2 i 

RightO) 

1  return  2/  +  1 

On  most  computers,  the  LEFT  procedure  can  compute  2i  in  one  instruction  by 
simply  shifting  the  binary  representation  of  i  left  by  one  bit  position.  Similarly,  the 
Right  procedure  can  quickly  compute  2/  + 1  by  shifting  the  binary  representation 
of  i  left  by  one  bit  position  and  then  adding  in  a  1  as  the  low-order  bit.  The 
Parent  procedure  can  compute  [i  / 2J  by  shifting  i  right  one  bit  position.  Good 
implementations  of  heapsort  often  implement  these  procedures  as  “macros”  or  “in¬ 
line”  procedures. 

There  are  two  kinds  of  binary  heaps:  max-heaps  and  min-heaps.  In  both  kinds, 
the  values  in  the  nodes  satisfy  a  heap  property ,  the  specifics  of  which  depend  on 
the  kind  of  heap.  In  a  max-heap,  the  max-heap  property  is  that  for  every  node  i 
other  than  the  root, 

v4[PARENT(/)]  >  A[i]  , 

that  is,  the  value  of  a  node  is  at  most  the  value  of  its  parent.  Thus,  the  largest 
element  in  a  max-heap  is  stored  at  the  root,  and  the  subtree  rooted  at  a  node  contains 
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values  no  larger  than  that  contained  at  the  node  itself.  A  min-heap  is  organized  in 
the  opposite  way;  the  min-heap  property  is  that  for  every  node  i  other  than  the 
root, 

A [Parent(z')]  <  A[i]  . 

The  smallest  element  in  a  min-heap  is  at  the  root. 

For  the  heapsort  algorithm,  we  use  max-heaps.  Min-heaps  commonly  imple¬ 
ment  priority  queues,  which  we  discuss  in  Section  6.5.  We  shall  be  precise  in 
specifying  whether  we  need  a  max-heap  or  a  min-heap  for  any  particular-  applica¬ 
tion,  and  when  properties  apply  to  either  max-heaps  or  min-heaps,  we  just  use  the 
term  “heap.” 

Viewing  a  heap  as  a  tree,  we  define  the  height  of  a  node  in  a  heap  to  be  the 
number  of  edges  on  the  longest  simple  downward  path  from  the  node  to  a  leaf,  and 
we  define  the  height  of  the  heap  to  be  the  height  of  its  root.  Since  a  heap  of  n  ele¬ 
ments  is  based  on  a  complete  binary  tree,  its  height  is  @(lg  n)  (see  Exercise  6.1-2). 
We  shall  see  that  the  basic  operations  on  heaps  run  in  time  at  most  proportional 
to  the  height  of  the  tree  and  thus  take  0{\gn)  time.  The  remainder  of  this  chapter 
presents  some  basic  procedures  and  shows  how  they  are  used  in  a  sorting  algorithm 
and  a  priority-queue  data  structure. 

•  The  Max-Heapify  procedure,  which  runs  in  0( lg  n)  time,  is  the  key  to  main¬ 
taining  the  max-heap  property. 

•  The  Build-Max-Heap  procedure,  which  runs  in  linear  time,  produces  a  max- 
heap  from  an  unordered  input  array. 

•  The  Heapsort  procedure,  which  runs  in  0(n\gn)  time,  sorts  an  array  in 
place. 

•  The  Max-Heap-Insert,  Heap-Extract-Max,  Heap-Increase-Key, 
and  Heap-Maximum  procedures,  which  run  in  0(lgn)  time,  allow  the  heap 
data  structure  to  implement  a  priority  queue. 

Exercises 


6.1-1 

What  are  the  minimum  and  maximum  numbers  of  elements  in  a  heap  of  height  hi 


6.1-2 

Show  that  an  » -element  heap  has  height  [lg  n\  • 


6.1-3 

Show  that  in  any  subtree  of  a  max-heap,  the  root  of  the  subtree  contains  the  largest 
value  occurring  anywhere  in  that  subtree. 
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6.1-4 

Where  in  a  max-heap  might  the  smallest  element  reside,  assuming  that  all  elements 
are  distinct? 


6.1-5 

Is  an  array  that  is  in  sorted  order  a  min-heap? 


6.1-6 

Is  the  array  with  values  (23,  17,  14,  6,  13,  10, 1, 5, 7,  12)  a  max-heap? 


6.1-7 

Show  that,  with  the  array  representation  for  storing  an  n -element  heap,  the  leaves 
are  the  nodes  indexed  by  [n/ 2]  +  1 ,  [n/2\  +2 


6.2  Maintaining  the  heap  property 

In  order  to  maintain  the  max-heap  property,  we  call  the  procedure  Max-Heapify. 
Its  inputs  are  an  array  A  and  an  index  i  into  the  array.  When  it  is  called,  Max- 
Heapify  assumes  that  the  binary  trees  rooted  at  Left(z')  and  Right  (z)  are  max- 
heaps,  but  that  A  [i  ]  might  be  smaller  than  its  children,  thus  violating  the  max-heap 
property.  Max-Heapify  lets  the  value  at  A[i]  “float  down”  in  the  max-heap  so 
that  the  subtree  rooted  at  index  i  obeys  the  max-heap  property. 

Max-Heapify  (H,  i) 

1  /  =  Left(z') 

2  r  =  Right  (z) 

3  if  l  <  A. heap-size  and  A[I\  >  m 

4  largest  =  / 

5  else  largest  =  z 

6  if  r  <  A. heap- size  and  A  [rj  >  A  [largest] 

7  largest  =  r 

8  if  largest  ^  z 

9  exchange  A[i]  with  A  [largest] 

10  Max-Heapify  (A.  largest) 

Figure  6.2  illustrates  the  action  of  Max-Heapify.  At  each  step,  the  largest  of 
the  elements  A[i\,  A  [Lept(/  )],  and  A  [Right(z')]  is  determined,  and  its  index  is 
stored  in  largest.  If  A  [z  ]  is  largest,  then  the  subtree  rooted  at  node  z  is  already  a 
max-heap  and  the  procedure  terminates.  Otherwise,  one  of  the  two  children  has  the 
largest  element,  and  A [z]  is  swapped  with  A[largest\,  which  causes  node  z  and  its 
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Figure  6.2  The  action  of  Max  Heapify(A,2),  where  A. heap  size  =  10.  (a)  The  initial  con 
figuration,  with  A  [2]  at  node  i  =  2  violating  the  max  heap  property  since  it  is  not  larger  than 
both  children.  The  max  heap  property  is  restored  for  node  2  in  (b)  by  exchanging  A[ 2]  with  A  [4], 
which  destroys  the  max  heap  property  for  node  4.  The  recursive  call  Max  HEAPIFY(A, 4)  now 
has  i  =  4.  After  swapping  A[ 4]  with  A [9],  as  shown  in  (c),  node  4  is  fixed  up,  and  the  recursive  call 
Max  Heapify(4,  9)  yields  no  further  change  to  the  data  structure. 


children  to  satisfy  the  max-heap  property.  The  node  indexed  by  largest ,  however, 
now  has  the  original  value  A[i\,  and  thus  the  subtree  rooted  at  largest  might  violate 
the  max-heap  property.  Consequently,  we  call  Max-Heapify  recursively  on  that 
subtree. 

The  running  time  of  Max-Heapify  on  a  subtree  of  size  n  rooted  at  a  given 
node  i  is  the  0(1)  time  to  fix  up  the  relationships  among  the  elements  /![/], 
A  [Left  (i )] ,  and  ,4[Right(i)],  plus  the  time  to  run  Max-Heapify  on  a  subtree 
rooted  at  one  of  the  children  of  node  i  (assuming  that  the  recursive  call  occurs). 
The  children’s  subtrees  each  have  size  at  most  2n/3— the  worst  case  occurs  when 
the  bottom  level  of  the  tree  is  exactly  half  full— and  therefore  we  can  describe  the 
running  time  of  Max-Heapify  by  the  recurrence 


7»  <  T(2n/3)  +  0(1)  . 
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The  solution  to  this  recurrence,  by  case  2  of  the  master  theorem  (Theorem  4.1), 
is  T(n )  =  0(\g  n).  Alternatively,  we  can  characterize  the  running  time  of  Max- 
Heapify  on  a  node  of  height  h  as  0(h). 

Exercises 


6.2-1 

Using  Figure  6.2  as  a  model,  illustrate  the  operation  of  Max-Heapify (A.  3)  on 
the  array  A  =  (27,  17,3, 16, 13,  10,  1,5,7.  12,4,8,9,0). 


6.2-2 

Starting  with  the  procedure  Max-Heapify,  write  pseudocode  for  the  procedure 
Min-Heapify(A,  i),  which  performs  the  corresponding  manipulation  on  a  min- 
heap.  How  does  the  running  time  of  Min-Heapify  compare  to  that  of  Max- 
Heapify? 


6.2-3 

What  is  the  effect  of  calling  Max-Heapify  (A,  i)  when  the  element  A[i]  is  larger 
than  its  children? 


6.2-4 

What  is  the  effect  of  calling  Max-Heapify  (A,  i )  for  i  >  A.  heap-size/ 21 


6.2-5 

The  code  for  Max-Heapify  is  quite  efficient  in  terms  of  constant  factors,  except 
possibly  for  the  recursive  call  in  line  10,  which  might  cause  some  compilers  to 
produce  inefficient  code.  Write  an  efficient  Max-Heapify  that  uses  an  iterative 
control  construct  (a  loop)  instead  of  recursion. 


6.2-6 

Show  that  the  worst-case  running  time  of  Max-Heapify  on  a  heap  of  size  n 
is  £2(lg«).  (Hint:  For  a  heap  with  n  nodes,  give  node  values  that  cause  Max- 
Heapify  to  be  called  recursively  at  every  node  on  a  simple  path  from  the  root 
down  to  a  leaf.) 


6.3  Building  a  heap 

We  can  use  the  procedure  Max-Heapify  in  a  bottom-up  manner  to  convert  an 
array  A[\..n\,  where  n  =  A. length,  into  a  max-heap.  By  Exercise  6.1-7,  the 
elements  in  the  subarray  A[(\ii/2\  +  1) . .  n]  are  all  leaves  of  the  tree,  and  so  each  is 


6.3  Building  a  heap 


157 


a  1-element  heap  to  begin  with.  The  procedure  Build-Max-Heap  goes  through 
the  remaining  nodes  of  the  tree  and  runs  Max-Heapify  on  each  one. 

Build-Max-Heap  (A) 

1  A. heap-size  =  A. length 

2  for  /  =  [A. length /2\  downto  1 

3  Max-Heapify  (A.  i) 

Figure  6.3  shows  an  example  of  the  action  of  Build-Max-Heap. 

To  show  why  Build-Max-Heap  works  correctly,  we  use  the  following  loop 
invariant: 

At  the  start  of  each  iteration  of  the  for  loop  of  lines  2-3,  each  node  i  +  1, 
i  +  2, . . . ,  n  is  the  root  of  a  max-heap. 

We  need  to  show  that  this  invariant  is  true  prior  to  the  first  loop  iteration,  that  each 
iteration  of  the  loop  maintains  the  invariant,  and  that  the  invariant  provides  a  useful 
property  to  show  correctness  when  the  loop  terminates. 

Initialization:  Prior  to  the  first  iteration  of  the  loop,  i  —  [n / 2J .  Each  node 
|_n/2J  +  I ,  \_n/2\  +2 . n  is  a  leaf  and  is  thus  the  root  of  a  trivial  max-heap. 

Maintenance:  To  see  that  each  iteration  maintains  the  loop  invariant,  observe  that 
the  children  of  node  i  are  numbered  higher  than  /  .By  the  loop  invariant,  there¬ 
fore,  they  are  both  roots  of  max-heaps.  This  is  precisely  the  condition  required 
for  the  call  Max-Heapify ( A ,  /)  to  make  node  /  a  max-heap  root.  Moreover, 
the  Max-Heapify  call  preserves  the  property  that  nodes  /  +  1,  i  +2 
are  all  roots  of  max-heaps.  Decrementing  /  in  the  for  loop  update  reestablishes 
the  loop  invariant  for  the  next  iteration. 

Termination:  At  termination,  i  =  0.  By  the  loop  invariant,  each  node  1,2 
is  the  root  of  a  max-heap.  In  particular,  node  1  is. 

We  can  compute  a  simple  upper  bound  on  the  running  time  of  Build-Max- 
Heap  as  follows.  Each  call  to  Max-Heapify  costs  0(\gn)  time,  and  Build- 
Max-Heap  makes  O(n)  such  calls.  Thus,  the  running  time  is  0(n  Ig n).  This 
upper  bound,  though  correct,  is  not  asymptotically  tight. 

We  can  derive  a  tighter  bound  by  observing  that  the  time  for  Max-Heapify  to 
run  at  a  node  varies  with  the  height  of  the  node  in  the  tree,  and  the  heights  of  most 
nodes  are  small.  Our  tighter  analysis  relies  on  the  properties  that  an  n  -element  heap 
has  height  \\gn\  (see  Exercise  6.1-2)  and  at  most  [~«/2/i+1]  nodes  of  any  height  h 
(see  Exercise  6.3-3). 

The  time  required  by  Max-Heapify  when  called  on  a  node  of  height  h  is  0(h), 
and  so  we  can  express  the  total  cost  of  Build-Max-Heap  as  being  bounded  from 
above  by 
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Figure  6.3  The  operation  of  Build  Max  Heap,  showing  the  data  structure  before  the  call  to 
Max  Heapify  in  line  3  of  Build  Max  Heap,  (a)  A  10  element  input  array  A  and  the  bi 
nary  tree  it  represents.  The  figure  shows  that  the  loop  index  i  refers  to  node  5  before  the  call 
Max  HEAPIFY(A,i).  (b)  The  data  structure  that  results.  The  loop  index  /  for  the  next  iteration 
refers  to  node  4.  (c)  (e)  Subsequent  iterations  of  the  for  loop  in  BUILD  Max  Heap.  Observe  that 
whenever  Max  Heapify  is  called  on  a  node,  the  two  subtrees  of  that  node  are  both  max  heaps, 
(f)  The  max  heap  after  Build  Max  Heap  finishes. 
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Lig«J 

Ebffrl  om 


h= 0 


We  evalaute  the  last  summation  by  substituting  x 
yielding 

^  h  1/2 

~  (1  -  1/2)2 

=  2 . 


1/2  in  the  formula  (A.8), 


Thus,  we  can  bound  the  running  time  of  Build-Max-Heap  as 


Lig«J 


h 


=  °[nJ2^ 


h= 0 


h  =  0 


2h 


=  0(n)  . 


Hence,  we  can  build  a  max-heap  from  an  unordered  array  in  linear  time. 

We  can  build  a  min-heap  by  the  procedure  Build-Min-Heap,  which  is  the 
same  as  Build-Max-Heap  but  with  the  call  to  Max-Heapify  in  line  3  replaced 
by  a  call  to  Min-Heapify  (see  Exercise  6.2-2).  Build-Min-Heap  produces  a 
min-heap  from  an  unordered  linear  array  in  linear  time. 


Exercises 


6.3-1 

Using  Figure  6.3  as  a  model,  illustrate  the  operation  of  Build-Max-Heap  on  the 
array  A  =  (5,  3,  17,  10,  84,  19,  6,  22,  9). 


6.3-2 

Why  do  we  want  the  loop  index  i  in  line  2  of  Build-Max-Heap  to  decrease  from 
[A. length/ 2J  to  1  rather  than  increase  from  1  to  [A .  length /2\  ? 


6.3-3 

Show  that  there  are  at  most  \n/ 2/i+1]  nodes  of  height  h  in  any  n -element  heap. 


6.4  The  heapsort  algorithm 

The  heapsort  algorithm  starts  by  using  Build-Max-Heap  to  build  a  max-heap 
on  the  input  array  A[  1  . .«],  where  n  =  A. length.  Since  the  maximum  element 
of  the  array  is  stored  at  the  root  .4  [  1  ] ,  we  can  put  it  into  its  correct  final  position 
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by  exchanging  it  with  A[n\.  If  we  now  discard  node  n  from  the  heap— and  we 
can  do  so  by  simply  decrementing  A. heap-size— we.  observe  that  the  children  of 
the  root  remain  max-heaps,  but  the  new  root  element  might  violate  the  max-heap 
property.  All  we  need  to  do  to  restore  the  max-heap  property,  however,  is  call 
Max-Heapify(A,  1),  which  leaves  a  max-heap  in  A[  1  .  .n  —  1].  The  heapsort 
algorithm  then  repeats  this  process  for  the  max-heap  of  size  n  —  1  down  to  a  heap 
of  size  2.  (See  Exercise  6.4-2  for  a  precise  loop  invariant.) 

Heapsort  (A) 

1  Build-Max-Heap(A) 

2  for  i  =  A.  length  down  to  2 

3  exchange  A[l]  with  A[i] 

4  A .  heap-size  =  A .  heap- size  —  1 

5  Max-Heapify(A,  1) 

Figure  6.4  shows  an  example  of  the  operation  of  HEAPSORT  after  line  1  has  built 
the  initial  max-heap.  The  figure  shows  the  max-heap  before  the  first  iteration  of 
the  for  loop  of  lines  2-5  and  after  each  iteration. 

The  Heapsort  procedure  takes  time  0(n  lg  «),  since  the  call  to  Build-Max- 
Heap  takes  time  O(n)  and  each  of  the  n  —  1  calls  to  Max-Heapify  takes 
time  O(lgn). 

Exercises 


6.4-1 

Using  Figure  6.4  as  a  model,  illustrate  the  operation  of  Heapsort  on  the  array 
A  =  (5,  13,2,25,7,  17,20,8,4). 


6.4-2 

Argue  the  correctness  of  HEAPSORT  using  the  following  loop  invariant: 

At  the  start  of  each  iteration  of  the  for  loop  of  lines  2-5,  the  subarray 
A[\  . .  i]  is  a  max-heap  containing  the  i  smallest  elements  of  A  [  I  . .  n\,  and 
the  subarray  A[i  +  1  . ./;]  contains  the  n  —  i  largest  elements  of  A[  1  . .  n], 
sorted. 


6.4-3 

What  is  the  running  time  of  Heapsort  on  an  array  A  of  length  n  that  is  already 
sorted  in  increasing  order?  What  about  decreasing  order? 


6.4-4 

Show  that  the  worst-case  running  time  of  Heapsort  is  Q(n  lg  n). 


6.4  The  heap  sort  algorithm 
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Figure  6.4  The  operation  of  Heapsort.  (a)  The  max  heap  data  structure  just  after  Build  Max 
Heap  has  built  it  in  line  1.  (b)  (j)  The  max  heap  just  after  each  call  of  Max  Heapify  in  line  5, 
showing  the  value  of  i  at  that  time.  Only  lightly  shaded  nodes  remain  in  the  heap,  (k)  The  resulting 
sorted  array  A. 
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6.4-5  * 

Show  that  when  all  elements  are  distinct,  the  best-case  running  time  of  HEAPSORT 
is  £l(n  lg  n). 


6.5  Priority  queues 

Heapsort  is  an  excellent  algorithm,  but  a  good  implementation  of  quicksort,  pre¬ 
sented  in  Chapter  7,  usually  beats  it  in  practice.  Nevertheless,  the  heap  data  struc¬ 
ture  itself  has  many  uses.  In  this  section,  we  present  one  of  the  most  popular  ap¬ 
plications  of  a  heap:  as  an  efficient  priority  queue.  As  with  heaps,  priority  queues 
come  in  two  forms:  max-priority  queues  and  min-priority  queues.  We  will  focus 
here  on  how  to  implement  max -priority  queues,  which  are  in  turn  based  on  max- 
heaps;  Exercise  6.5-3  asks  you  to  write  the  procedures  for  min-priority  queues. 

A  priority  queue  is  a  data  structure  for  maintaining  a  set  S  of  elements,  each 
with  an  associated  value  called  a  key.  A  max-priority  queue  supports  the  following 
operations: 

INSERT (5,  x)  inserts  the  element  x  into  the  set  S,  which  is  equivalent  to  the  oper¬ 
ation  S  =  S  U  {x}. 

Maximum  (S)  returns  the  element  of  S  with  the  largest  key. 

Extract-Max  (S)  removes  and  returns  the  element  of  S  with  the  largest  key. 

Increase-Key  ( S ,  x,  k)  increases  the  value  of  element  x’s  key  to  the  new  value  k, 
which  is  assumed  to  be  at  least  as  large  as  x’s  current  key  value. 

Among  their  other  applications,  we  can  use  max -priority  queues  to  schedule 
jobs  on  a  shared  computer.  The  max-priority  queue  keeps  track  of  the  jobs  to 
be  performed  and  their  relative  priorities.  When  a  job  is  finished  or  interrupted, 
the  scheduler  selects  the  highest-priority  job  from  among  those  pending  by  calling 
Extract-Max.  The  scheduler  can  add  a  new  job  to  the  queue  at  any  time  by 
calling  Insert. 

Alternatively,  a  min-priority  queue  supports  the  operations  Insert,  Minimum, 
Extract-Min,  and  Decrease-Key.  A  min-priority  queue  can  be  used  in  an 
event-driven  simulator.  The  items  in  the  queue  are  events  to  be  simulated,  each 
with  an  associated  time  of  occurrence  that  serves  as  its  key.  The  events  must  be 
simulated  in  order  of  their  time  of  occurrence,  because  the  simulation  of  an  event 
can  cause  other  events  to  be  simulated  in  the  future.  The  simulation  program  calls 
Extract-Min  at  each  step  to  choose  the  next  event  to  simulate.  As  new  events  are 
produced,  the  simulator  inserts  them  into  the  min-priority  queue  by  calling  INSERT. 


6.5  Priority  queues 
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We  shall  see  other  uses  for  min-priority  queues,  highlighting  the  Decrease-Key 
operation,  in  Chapters  23  and  24. 

Not  surprisingly,  we  can  use  a  heap  to  implement  a  priority  queue.  In  a  given  ap¬ 
plication,  such  as  job  scheduling  or  event-driven  simulation,  elements  of  a  priority 
queue  correspond  to  objects  in  the  application.  We  often  need  to  determine  which 
application  object  corresponds  to  a  given  priority-queue  element,  and  vice  versa. 
When  we  use  a  heap  to  implement  a  priority  queue,  therefore,  we  often  need  to 
store  a  handle  to  the  corresponding  application  object  in  each  heap  element.  The 
exact  makeup  of  the  handle  (such  as  a  pointer  or  an  integer)  depends  on  the  ap¬ 
plication.  Similarly,  we  need  to  store  a  handle  to  the  corresponding  heap  element 
in  each  application  object.  Here,  the  handle  would  typically  be  an  array  index. 
Because  heap  elements  change  locations  within  the  array  during  heap  operations, 
an  actual  implementation,  upon  relocating  a  heap  element,  would  also  have  to  up¬ 
date  the  array  index  in  the  corresponding  application  object.  Because  the  details 
of  accessing  application  objects  depend  heavily  on  the  application  and  its  imple¬ 
mentation,  we  shall  not  pursue  them  here,  other  than  noting  that  in  practice,  these 
handles  do  need  to  be  correctly  maintained. 

Now  we  discuss  how  to  implement  the  operations  of  a  max -priority  queue.  The 
procedure  Heap-Maximum  implements  the  Maximum  operation  in  0(1)  time. 

Heap-Maximum  (A) 

1  return  A[  1] 

The  procedure  Heap-Extract-Max  implements  the  Extract-Max  opera¬ 
tion.  It  is  similar  to  the  for  loop  body  (lines  3-5)  of  the  HEAPSORT  procedure. 

Heap-Extract-Max  (A) 

1  it  A. heap-size  <  1 

2  error  “heap  underflow” 

3  max  =  H[l] 

4  H[l]  =  A  [A .  heap-size] 

5  A .  heap-size  =  A .  heap-size  —  1 

6  Max-Heapify(H,  1) 

7  return  max 

The  running  time  of  Heap-Extract-Max  is  0{\gn),  since  it  performs  only  a 
constant  amount  of  work  on  top  of  the  0( lg  n)  time  for  Max-Heapify. 

The  procedure  Heap-Increase-Key  implements  the  Increase-Key  opera¬ 
tion.  An  index  i  into  the  array  identifies  the  priority-queue  element  whose  key  we 
wish  to  increase.  The  procedure  first  updates  the  key  of  element  A[i\  to  its  new 
value.  Because  increasing  the  key  of  A[i]  might  violate  the  max-heap  property, 
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the  procedure  then,  in  a  manner  reminiscent  of  the  insertion  loop  (lines  5-7)  of 
Insertion-Sort  from  Section  2.1,  traverses  a  simple  path  from  this  node  toward 
the  root  to  find  a  proper  place  for  the  newly  increased  key.  As  Heap-Increase- 
Key  traverses  this  path,  it  repeatedly  compares  an  element  to  its  parent,  exchang¬ 
ing  their  keys  and  continuing  if  the  element’s  key  is  larger,  and  terminating  if  the  el¬ 
ement’s  key  is  smaller,  since  the  max-heap  property  now  holds.  (See  Exercise  6.5-5 
for  a  precise  loop  invariant.) 

Heap-Increase-Key  (A,  i,  key) 

1  if  key  <  A[i ] 

2  error  “new  key  is  smaller  than  current  key” 

3  A[i]  =  key 

4  while  /  >  1  and  A[Parent(/)]  <  m 

5  exchange  A[i]  with  A[Parent(z')] 

6  i  =  Parent  (?) 

Figure  6.5  shows  an  example  of  a  Heap-Increase-Key  operation.  The  running 
time  of  Heap-Increase-Key  on  an  n-element  heap  is  0(lgn),  since  the  path 
traced  from  the  node  updated  in  line  3  to  the  root  has  length  0(lg  n). 

The  procedure  Max-Heap-Insert  implements  the  Insert  operation.  It  takes 
as  an  input  the  key  of  the  new  element  to  be  inserted  into  max-heap  A.  The  proce¬ 
dure  first  expands  the  max -heap  by  adding  to  the  tree  a  new  leaf  whose  key  is  — oo. 
Then  it  calls  Heap-Increase-Key  to  set  the  key  of  this  new  node  to  its  correct 
value  and  maintain  the  max-heap  property. 

Max-Heap-Insert  (A,  key) 

1  A. heap-size  =  A. heap- size  +  1 

2  A  [A .  heap-size]  =  — oo 

3  Heap-Increase-Key  (A,  A.  heap-size,  key) 

The  running  time  of  Max-Heap-Insert  on  an  » -element  heap  is  0( lg  n). 

In  summary,  a  heap  can  support  any  priority-queue  operation  on  a  set  of  size  n 
in  0{ lg n)  time. 

Exercises 


6.5-1 

Illustrate  the  operation  of  Heap-Extract-Max  on  the  heap  A  =  (15,  13,  9,  5, 
12,8,7,4,0,6,2,1). 


6.5  Priority  queues 
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Figure  6.5  The  operation  of  Heap  Increase  Key.  (a)  The  max  heap  of  Figure  6.4(a)  with  a 
node  whose  index  is  i  heavily  shaded,  (b)  This  node  has  its  key  increased  to  15.  (c)  After  one 
iteration  of  the  while  loop  of  lines  4  6,  the  node  and  its  parent  have  exchanged  keys,  and  the  index  i 
moves  up  to  the  parent,  (d)  The  max  heap  after  one  more  iteration  of  the  while  loop.  At  this  point, 
4[Parent((  )]  >  A[i],  The  max  heap  property  now  holds  and  the  procedure  terminates. 


6.5-2 

Illustrate  the  operation  of  Max-Heap-Insert(,4,  10)  on  the  heap  A  =  (15, 13, 9, 
5,12,8, 7,4,0, 6,2,1). 


6.5- 3 

Write  pseudocode  for  the  procedures  Heap-Minimum,  Heap-Extract-Min, 
Heap-Decrease-Key,  and  Min- Heap-Insert  that  implement  a  min-priority 
queue  with  a  min-heap. 

6.5- 4 

Why  do  we  bother  setting  the  key  of  the  inserted  node  to  — oo  in  line  2  of  Max- 
HEAP-lNSERT  when  the  next  thing  we  do  is  increase  its  key  to  the  desired  value? 
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6.5-5 

Argue  the  correctness  of  Heap-Increase-Key  using  the  following  loop  invari¬ 
ant: 

At  the  stall  of  each  iteration  of  the  while  loop  of  lines  4-6,  the  subarray 
A[  1 . .  A. heap-size]  satisfies  the  max-heap  property,  except  that  there  may 
be  one  violation:  A[i\  may  be  larger  than  4 [Parent (z)]- 

You  may  assume  that  the  subarray  A[  1 . .  A. heap-size]  satisfies  the  max-heap  prop¬ 
erty  at  the  time  Heap-Increase-Key  is  called. 


6.5-6 

Each  exchange  operation  on  line  5  of  Heap-Increase-Key  typically  requires 
three  assignments.  Show  how  to  use  the  idea  of  the  inner  loop  of  INSERTION- 
SORT  to  reduce  the  three  assignments  down  to  just  one  assignment. 


6.5-7 

Show  how  to  implement  a  first-in,  first-out  queue  with  a  priority  queue.  Show 
how  to  implement  a  stack  with  a  priority  queue.  (Queues  and  stacks  are  defined  in 
Section  10.1.) 


6.5-8 

The  operation  Heap-Delete  (A,  i )  deletes  the  item  in  node  i  from  heap  A.  Give 
an  implementation  of  Heap-Delete  that  runs  in  0(lgn)  time  for  an  n -element 
max-heap. 


6.5-9 

Give  an  0(n  lg/c)-time  algorithm  to  merge  k  sorted  lists  into  one  sorted  list, 
where  n  is  the  total  number  of  elements  in  all  the  input  lists.  {Hint:  Use  a  min- 
heap  for  k- way  merging.) 


Problems 


6-1  Building  a  heap  using  insertion 

We  can  build  a  heap  by  repeatedly  calling  Max-Heap-Insert  to  insert  the  ele¬ 
ments  into  the  heap.  Consider  the  following  variation  on  the  Build-Max-Heap 
procedure: 
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Build-Max-Heap'  (A) 

1  A. heap-size  =  1 

2  for  i  —  2  to  A .  length 

3  Max-Heap-Insert  (A,  A[i]) 

a.  Do  the  procedures  Build-Max-Heap  and  Build-Max-Heap'  always  create 
the  same  heap  when  run  on  the  same  input  array?  Prove  that  they  do,  or  provide 
a  counterexample. 

b.  Show  that  in  the  worst  case,  Build-Max-Heap'  requires  0(nlg«)  time  to 
build  an  n -element  heap. 

6-2  Analysis  of  d  -aiy  heaps 

A  d-ary  heap  is  like  a  binary  heap,  but  (with  one  possible  exception)  non-leaf 
nodes  have  d  children  instead  of  2  children. 

a.  How  would  you  represent  a  d -ary  heap  in  an  array? 

b.  What  is  the  height  of  a  d -ary  heap  of  n  elements  in  terms  of  n  and  d  ? 

c.  Give  an  efficient  implementation  of  Extract-Max  in  a  d- ary  max-heap.  An¬ 
alyze  its  running  time  in  terms  of  d  and  n. 

d.  Give  an  efficient  implementation  of  INSERT  in  a  d- ary  max-heap.  Analyze  its 
running  time  in  terms  of  d  and  n . 

e.  Give  an  efficient  implementation  of  Increase-Key(A,  i,  k),  which  flags  an 
error  if  k  <  A[i ],  but  otherwise  sets  A[i]  =  k  and  then  updates  the  d- ary  max- 
heap  structure  appropriately.  Analyze  its  running  time  in  terms  of  d  and  n. 

6-3  Young  tableaus 

An  m  x  n  Young  tableau  is  an  m  x  n  matrix  such  that  the  entries  of  each  row  are 
in  sorted  order  from  left  to  right  and  the  entries  of  each  column  are  in  sorted  order 
from  top  to  bottom.  Some  of  the  entries  of  a  Young  tableau  may  be  oo,  which  we 
treat  as  nonexistent  elements.  Thus,  a  Young  tableau  can  be  used  to  hold  r  <  mn 
finite  numbers. 

a.  Draw  a  4x  4  Young  tableau  containing  the  elements  {9,  16,  3, 2,  4,  8,  5,  14, 12}. 

b.  Argue  that  an  m  x  n  Young  tableau  Y  is  empty  if  Y[l,  1]  =  oo.  Argue  that  Y 
is  full  (contains  mn  elements)  if  Y[m,n]  <  oo. 
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c.  Give  an  algorithm  to  implement  Extract-Min  on  a  nonempty  m  x  n  Young 
tableau  that  runs  in  0(m  +  n)  time.  Your  algorithm  should  use  a  recur¬ 
sive  subroutine  that  solves  an  m  x  n  problem  by  recursively  solving  either 
an  (in  —  1)  x  n  or  an  m  x  (n  —  1)  subproblem.  (Hint:  Think  about  Max- 
Heapify.)  Define  T(p),  where  p  =  m  +  n,  to  be  the  maximum  running  time 
of  Extract-Min  on  any  m  x  n  Young  tableau.  Give  and  solve  a  recurrence 
for  T(p)  that  yields  the  0(m  +  n)  time  bound. 

d.  Show  how  to  insert  a  new  element  into  a  nonfull  m  x  n  Young  tableau  in 
0(m  +  n)  time. 

e.  Using  no  other  sorting  method  as  a  subroutine,  show  how  to  use  an  nxn  Young 
tableau  to  sort  n2  numbers  in  0(n3)  time. 

/.  Give  an  0(m  +  /?)-time  algorithm  to  determine  whether  a  given  number  is 
stored  in  a  given  m  x  n  Young  tableau. 


Chapter  notes 

The  heapsort  algorithm  was  invented  by  Williams  [357],  who  also  described  how 
to  implement  a  priority  queue  with  a  heap.  The  Build-Max-Heap  procedure 
was  suggested  by  Floyd  [106]. 

We  use  min-heaps  to  implement  min-priority  queues  in  Chapters  16,  23,  and  24. 
We  also  give  an  implementation  with  improved  time  bounds  for  certain  operations 
in  Chapter  19  and,  assuming  that  the  keys  are  drawn  from  a  bounded  set  of  non¬ 
negative  integers,  Chapter  20. 

If  the  data  are  b- bit  integers,  and  the  computer  memory  consists  of  addressable 
7>-bit  words,  Fredman  and  Willard  [115]  showed  how  to  implement  Minimum  in 
0(1)  time  and  INSERT  and  Extract-Min  in  0(^J lg n)  time.  Thorup  [337]  has 
improved  the  0(^/ lg n )  bound  to  O (lg  lg n)  time.  This  bound  uses  an  amount  of 
space  unbounded  in  n,  but  it  can  be  implemented  in  linear  space  by  using  random¬ 
ized  hashing. 

An  important  special  case  of  priority  queues  occurs  when  the  sequence  of 
Extract-Min  operations  is  monotone ,  that  is,  the  values  returned  by  succes¬ 
sive  Extract-Min  operations  are  monotonically  increasing  over  time.  This  case 
arises  in  several  important  applications,  such  as  Dijkstra’s  single-source  shortest- 
paths  algorithm,  which  we  discuss  in  Chapter  24,  and  in  discrete-event  simula¬ 
tion.  For  Dijkstra’s  algorithm  it  is  particularly  important  that  the  Decrease-Key 
operation  be  implemented  efficiently.  For  the  monotone  case,  if  the  data  are  in¬ 
tegers  in  the  range  1, 2, . . . ,  C,  Ahuja,  Mehlhorn,  Orlin,  and  Tarjan  [8]  describe 
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how  to  implement  Extract-Min  and  INSERT  in  0(lg  C  )  amortized  time  (see 
Chapter  17  for  more  on  amortized  analysis)  and  Decrease-Key  in  0(1)  time, 
using  a  data  structure  called  a  radix  heap.  The  O(lgC)  bound  can  be  improved 
to  O  ( y/lg  C  )  using  Fibonacci  heaps  (see  Chapter  19)  in  conjunction  with  radix 
heaps.  Cherkassky,  Goldberg,  and  Silverstein  [65]  further  improved  the  bound  to 
0(lgl/3+e  C)  expected  time  by  combining  the  multilevel  bucketing  structure  of 
Denardo  and  Fox  [85]  with  the  heap  of  Thorup  mentioned  earlier.  Raman  [291] 
further  improved  these  results  to  obtain  a  bound  of  0(min(lg1'4+e  C,  Ig l/3+e  «)), 
for  any  fixed  e  >  0. 
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The  quicksort  algorithm  has  a  worst-case  running  time  of  0(«2)  on  an  input  array 
of  n  numbers.  Despite  this  slow  worst-case  running  time,  quicksort  is  often  the  best 
practical  choice  for  sorting  because  it  is  remarkably  efficient  on  the  average:  its 
expected  running  time  is  0(n  Ig  n),  and  the  constant  factors  hidden  in  the  0(«  lg  n) 
notation  are  quite  small.  It  also  has  the  advantage  of  sorting  in  place  (see  page  17), 
and  it  works  well  even  in  virtual-memory  environments. 

Section  7.1  describes  the  algorithm  and  an  important  subroutine  used  by  quick¬ 
sort  for  partitioning.  Because  the  behavior  of  quicksort  is  complex,  we  start  with 
an  intuitive  discussion  of  its  performance  in  Section  7.2  and  postpone  its  precise 
analysis  to  the  end  of  the  chapter.  Section  7.3  presents  a  version  of  quicksort  that 
uses  random  sampling.  This  algorithm  has  a  good  expected  running  time,  and  no 
particular  input  elicits  its  worst-case  behavior.  Section  7.4  analyzes  the  random¬ 
ized  algorithm,  showing  that  it  runs  in  0(/?2)  time  in  the  worst  case  and,  assuming 
distinct  elements,  in  expected  0(n  lg  n)  time. 


7.1  Description  of  quicksort 

Quicksort,  like  merge  sort,  applies  the  divide-and-conquer  paradigm  introduced 
in  Section  2.3.1.  Here  is  the  three-step  divide-and-conquer  process  for  sorting  a 
typical  subarray  A[p  . .  r\. 

Divide:  Partition  (rearrange)  the  array  A[p  . .  r]  into  two  (possibly  empty)  subar¬ 
rays  A[p  . .  q  —  1]  and  A[q  +  1  . .  r]  such  that  each  element  of  A[p  . .  —  1]  is 

less  than  or  equal  to  A[q\,  which  is,  in  turn,  less  than  or  equal  to  each  element 
of  A[q  +  1 . .  r].  Compute  the  index  q  as  part  of  this  partitioning  procedure. 

Conquer:  Sort  the  two  subarrays  A[p  . .  q  —  1]  and  A[q  +  1 . .  r]  by  recursive  calls 
to  quicksort. 
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Combine:  Because  the  subarrays  are  already  sorted,  no  work  is  needed  to  combine 
them:  the  entire  array  A  [p  . .  r]  is  now  sorted. 

The  following  procedure  implements  quicksort: 

Quicksort^,  p ,  r ) 

1  if  p  <  r 

2  q  =  Partition  (A.  p,  r) 

3  Quicksort^,  p.q  -  1) 

4  Quicksort^,  q  +  l,r) 

To  sort  an  entire  array  A,  the  initial  call  is  QUICKSORT^,  1,  A. length). 

Partitioning  the  array 

The  key  to  the  algorithm  is  the  PARTITION  procedure,  which  rearranges  the  subar¬ 
ray  A[p  . .  r]  in  place. 

Partition  (A,  p,r) 

1  x  =  A[r] 

2  i  —  p  —  1 

3  for  j  =  p  to  r  —  1 

4  if  A[j]  <  x 

5  i  —  i  - 1-1 

6  exchange  A [/]  with  A[j] 

7  exchange  A[i  +  1]  with  A[r] 

8  return  i  +  1 

Figure  7.1  shows  how  PARTITION  works  on  an  8-element  array.  PARTITION 
always  selects  an  element  x  =  A[r]  as  a  pivot  element  around  which  to  partition  the 
subarray  A[p  . .  r].  As  the  procedure  runs,  it  partitions  the  array  into  four  (possibly 
empty)  regions.  At  the  start  of  each  iteration  of  the  for  loop  in  lines  3-6,  the  regions 
satisfy  certain  properties,  shown  in  Figure  7.2.  We  state  these  properties  as  a  loop 
invariant: 

At  the  beginning  of  each  iteration  of  the  loop  of  lines  3-6,  for  any  array 
index  k, 

1.  If  p  <  k  <  i,  then  A[k\  <  x. 

2.  If  i  +  1  <  k  <  j  —  1,  then  A[k]  >  x. 

3.  If  k  =  r,  then  A[k\  =  x. 
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Figure  7.1  The  operation  of  PARTITION  on  a  sample  array.  Array  entry  A[r]  becomes  the  pivot 
element  x.  Lightly  shaded  array  elements  are  all  in  the  first  partition  with  values  no  greater  than  x. 
Heavily  shaded  elements  are  in  the  second  partition  with  values  greater  than  x.  The  unshaded  el 
ements  have  not  yet  been  put  in  one  of  the  first  two  partitions,  and  the  final  white  element  is  the 
pivot  x.  (a)  The  initial  array  and  variable  settings.  None  of  the  elements  have  been  placed  in  either 
of  the  first  two  partitions,  (b)  The  value  2  is  “swapped  with  itself’  and  put  in  the  partition  of  smaller 
values,  (c)  (d)  The  values  8  and  7  are  added  to  the  partition  of  larger  values,  (e)  The  values  1  and  8 
are  swapped,  and  the  smaller  partition  grows,  (f)  The  values  3  and  7  are  swapped,  and  the  smaller 
partition  grows,  (g)  (h)  The  larger  partition  grows  to  include  5  and  6,  and  the  loop  terminates,  (i)  In 
lines  7  8,  the  pivot  element  is  swapped  so  that  it  lies  between  the  two  partitions. 

The  indices  between  j  and  r  —  1  are  not  covered  by  any  of  the  three  cases,  and  the 
values  in  these  entries  have  no  particular  relationship  to  the  pivot  a. 

We  need  to  show  that  this  loop  invariant  is  true  prior  to  the  first  iteration,  that 
each  iteration  of  the  loop  maintains  the  invariant,  and  that  the  invariant  provides  a 
useful  property  to  show  correctness  when  the  loop  terminates. 
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Figure  12  The  four  regions  maintained  by  the  procedure  PARTITION  on  a  subarray  A[p . .  r\.  The 
values  in  A[p  ..  i]  are  all  less  than  or  equal  to  a,  the  values  in  A[i  +  1 . .  j  —  1]  are  all  greater  than  x, 
and  A[r\  =  x.  The  subarray  A[j  . .  r  —  1]  can  take  on  any  values. 

Initialization:  Prior  to  the  first  iteration  of  the  loop,  i  =  p  —  1  and  j  =  p.  Be¬ 
cause  no  values  lie  between  p  and  i  and  no  values  lie  between  i  +  1  and  j  —  1, 
the  first  two  conditions  of  the  loop  invariant  are  trivially  satisfied.  The  assign¬ 
ment  in  line  1  satisfies  the  third  condition. 

Maintenance:  As  Figure  7.3  shows,  we  consider  two  cases,  depending  on  the 
outcome  of  the  test  in  line  4.  Figure  7.3(a)  shows  what  happens  when  A[j]  >  x\ 
the  only  action  in  the  loop  is  to  increment  j .  After  j  is  incremented,  condition  2 
holds  for  A[j  —  1]  and  all  other  entries  remain  unchanged.  Figure  7.3(b)  shows 
what  happens  when  A\j\  <  x\  the  loop  increments  i,  swaps  A[i ]  and  A[j], 
and  then  increments  j .  Because  of  the  swap,  we  now  have  that  A[/]  <  x,  and 
condition  1  is  satisfied.  Similarly,  we  also  have  that  A[j  —  1]  >  x,  since  the 
item  that  was  swapped  into  A[j  —  1]  is,  by  the  loop  invariant,  greater  than  x. 

Termination:  At  termination,  j  =  r.  Therefore,  every  entry  in  the  array  is  in  one 
of  the  three  sets  described  by  the  invariant,  and  we  have  partitioned  the  values 
in  the  array  into  three  sets:  those  less  than  or  equal  to  x,  those  greater  than  x, 
and  a  singleton  set  containing  x. 

The  final  two  lines  of  Partition  finish  up  by  swapping  the  pivot  element  with 
the  leftmost  element  greater  than  x ,  thereby  moving  the  pivot  into  its  correct  place 
in  the  partitioned  array,  and  then  returning  the  pivot’s  new  index.  The  output  of 
PARTITION  now  satisfies  the  specifications  given  for  the  divide  step.  In  fact,  it 
satisfies  a  slightly  stronger  condition:  after  line  2  of  QUICKSORT,  A[q\  is  strictly 
less  than  every  element  of  A[q  +  1 . .  r]. 

The  running  time  of  Partition  on  the  subarray  A[p..r]  is  ©(«),  where 
n  =  r  —  p  4-  1  (see  Exercise  7.1-3). 

Exercises 


7.1-1 

Using  Figure  7.1  as  a  model,  illustrate  the  operation  of  Partition  on  the  array 
A  =  (13,19,9,5, 12,8,7,4.21,2,6,11). 
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Figure  7.3  The  two  cases  for  one  iteration  of  procedure  PARTITION,  (a)  If  A[j]  >  x,  the  only 
action  is  to  increment  j,  which  maintains  the  loop  invariant,  (b)  If  A[j]  <  x,  index  /  is  incremented, 
■4[i]  and  A[j\  are  swapped,  and  then  j  is  incremented.  Again,  the  loop  invariant  is  maintained. 


7.1-2 

What  value  of  q  does  PARTITION  return  when  all  elements  in  the  array  A[p .  .r] 
have  the  same  value?  Modify  Partition  so  that  q  =  L(P  +  r)/2J  when  all 
elements  in  the  array  A[p  ..r]  have  the  same  value. 


7.1- 3 

Give  a  brief  argument  that  the  running  time  of  Partition  on  a  subarray  of  size  n 
is  0(n). 

7.1- 4 

How  would  you  modify  Quicksort  to  sort  into  nonincreasing  order? 


7.2  Performance  of  quicksort 

The  running  time  of  quicksort  depends  on  whether  the  partitioning  is  balanced  or 
unbalanced,  which  in  turn  depends  on  which  elements  are  used  for  partitioning. 
If  the  partitioning  is  balanced,  the  algorithm  runs  asymptotically  as  fast  as  merge 
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sort.  If  the  partitioning  is  unbalanced,  however,  it  can  run  asymptotically  as  slowly 
as  insertion  sort.  In  this  section,  we  shall  informally  investigate  how  quicksort 
performs  under  the  assumptions  of  balanced  versus  unbalanced  partitioning. 

Worst-case  partitioning 

The  worst-case  behavior  for  quicksort  occurs  when  the  partitioning  routine  pro¬ 
duces  one  subproblem  with  n  —  1  elements  and  one  with  0  elements.  (We  prove 
this  claim  in  Section  7.4.1.)  Let  us  assume  that  this  unbalanced  partitioning  arises 
in  each  recursive  call.  The  partitioning  costs  0(n)  time.  Since  the  recursive  call 
on  an  array  of  size  0  just  returns,  T{ 0)  =  0(1),  and  the  recurrence  for  the  running 
time  is 

Tin)  =  T(n  —  1)  +  T(0)  +  0(«) 

=  T(n  -  1)  +  0(«)  . 

Intuitively,  if  we  sum  the  costs  incurred  at  each  level  of  the  recursion,  we  get 
an  arithmetic  series  (equation  (A.2)),  which  evaluates  to  0(/?2).  Indeed,  it  is 
straightforward  to  use  the  substitution  method  to  prove  that  the  recurrence  T(n)  = 
Tin  —  1)  +  0(/i)  has  the  solution  Tin)  =  0(«2).  (See  Exercise  7.2-1.) 

Thus,  if  the  partitioning  is  maximally  unbalanced  at  every  recursive  level  of  the 
algorithm,  the  running  time  is  0(/?2).  Therefore  the  worst-case  running  time  of 
quicksort  is  no  better  than  that  of  insertion  sort.  Moreover,  the  0(n2)  running  time 
occurs  when  the  input  array  is  already  completely  sorted— a  common  situation  in 
which  insertion  sort  runs  in  0(n )  time. 

Best-case  partitioning 

In  the  most  even  possible  split,  PARTITION  produces  two  subproblems,  each  of 
size  no  more  than  n/  2,  since  one  is  of  size  In  /  2J  and  one  of  size  \n  /  2]  —  1 .  In  this 
case,  quicksort  runs  much  faster.  The  recurrence  for  the  running  time  is  then 

Tin)  =  2Tin/2)  +  0(n)  , 

where  we  tolerate  the  sloppiness  from  ignoring  the  floor  and  ceiling  and  from  sub¬ 
tracting  1.  By  case  2  of  the  master  theorem  (Theorem  4.1),  this  recurrence  has  the 
solution  Tin)  =  0 (/?  lg  n).  By  equally  balancing  the  two  sides  of  the  partition  at 
every  level  of  the  recursion,  we  get  an  asymptotically  faster  algorithm. 

Balanced  partitioning 

The  average-case  running  time  of  quicksort  is  much  closer  to  the  best  case  than  to 
the  worst  case,  as  the  analyses  in  Section  7.4  will  show.  The  key  to  understand- 
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Figure  7.4  A  recursion  tree  for  QUICKSORT  in  which  PARTITION  always  produces  a  9  to  1  split, 
yielding  a  running  time  of  0(n  lg  n).  Nodes  show  subproblem  sizes,  with  per  level  costs  on  the  right. 
The  per  level  costs  include  the  constant  c  implicit  in  the  @(/i)  term. 

ing  why  is  to  understand  how  the  balance  of  the  partitioning  is  reflected  in  the 
recurrence  that  describes  the  running  time. 

Suppose,  for  example,  that  the  partitioning  algorithm  always  produces  a  9-to- 1 
proportional  split,  which  at  first  blush  seems  quite  unbalanced.  We  then  obtain  the 
recurrence 

T(n)  =  r(9n/10)  +  7\/i/10)  +  cn  , 

on  the  running  time  of  quicksort,  where  we  have  explicitly  included  the  constant  c 
hidden  in  the  0(/t)  term.  Figure  7.4  shows  the  recursion  tree  for  this  recurrence. 
Notice  that  every  level  of  the  tree  has  cost  cn,  until  the  recursion  reaches  a  bound¬ 
ary  condition  at  depth  log10  «  =  @(lg  n),  and  then  the  levels  have  cost  at  most  cn. 
The  recursion  terminates  at  depth  log10/g  n  =  Q(lg  n).  The  total  cost  of  quick¬ 
sort  is  therefore  0(n  lg  n).  Thus,  with  a  9-to-l  proportional  split  at  every  level  of 
recursion,  which  intuitively  seems  quite  unbalanced,  quicksort  runs  in  0(n\gn) 
time— asymptotically  the  same  as  if  the  split  were  right  down  the  middle.  Indeed, 
even  a  99-to-l  split  yields  an  0(n  lg  n)  running  time.  In  fact,  any  split  of  constant 
proportionality  yields  a  recursion  tree  of  depth  ©(lg  n),  where  the  cost  at  each  level 
is  O(n).  The  running  time  is  therefore  0(n  lg  n)  whenever  the  split  has  constant 
proportionality. 
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Figure  7.5  (a)  Two  levels  of  a  recursion  tree  for  quicksort.  The  partitioning  at  the  root  costs  n 

and  produces  a  “bad”  split:  two  subarrays  of  sizes  0  and  n  —  1.  The  partitioning  of  the  subarray  of 
size  n  —  1  costs  n  —  1  and  produces  a  “good”  split:  subarrays  of  size  (n  —  l)/2  —  1  and  (n  —  l)/2. 
(b)  A  single  level  of  a  recursion  tree  that  is  very  well  balanced.  In  both  parts,  the  partitioning  cost  for 
the  subproblems  shown  with  elliptical  shading  is  (*•)(«).  Yet  the  subproblems  remaining  to  be  solved 
in  (a),  shown  with  square  shading,  are  no  larger  than  the  corresponding  subproblems  remaining  to  be 
solved  in  (b). 

Intuition  for  the  average  case 

To  develop  a  clear  notion  of  the  randomized  behavior  of  quicksort,  we  must  make 
an  assumption  about  how  frequently  we  expect  to  encounter  the  various  inputs. 
The  behavior  of  quicksort  depends  on  the  relative  ordering  of  the  values  in  the 
array  elements  given  as  the  input,  and  not  by  the  particular  values  in  the  array.  As 
in  our  probabilistic  analysis  of  the  hiring  problem  in  Section  5.2,  we  will  assume 
for  now  that  all  permutations  of  the  input  numbers  are  equally  likely. 

When  we  run  quicksort  on  a  random  input  array,  the  partitioning  is  highly  un¬ 
likely  to  happen  in  the  same  way  at  every  level,  as  our  informal  analysis  has  as¬ 
sumed.  We  expect  that  some  of  the  splits  will  be  reasonably  well  balanced  and 
that  some  will  be  fairly  unbalanced.  For  example,  Exercise  7.2-6  asks  you  to  show 
that  about  80  percent  of  the  time  Partition  produces  a  split  that  is  more  balanced 
than  9  to  1,  and  about  20  percent  of  the  time  it  produces  a  split  that  is  less  balanced 
than  9  to  1 . 

In  the  average  case,  Partition  produces  a  mix  of  “good”  and  “bad”  splits.  In  a 
recursion  tree  for  an  average-case  execution  of  Partition,  the  good  and  bad  splits 
are  distributed  randomly  throughout  the  tree.  Suppose,  for  the  sake  of  intuition, 
that  the  good  and  bad  splits  alternate  levels  in  the  tree,  and  that  the  good  splits 
are  best-case  splits  and  the  bad  splits  are  worst-case  splits.  Figure  7.5(a)  shows 
the  splits  at  two  consecutive  levels  in  the  recursion  tree.  At  the  root  of  the  tree, 
the  cost  is  n  for  partitioning,  and  the  subarrays  produced  have  sizes  n  —  1  and  0: 
the  worst  case.  At  the  next  level,  the  subarray  of  size  n  —  1  undergoes  best-case 
partitioning  into  subarrays  of  size  (n  —  l)/2  —  1  and  (n  —  l)/2.  Let’s  assume  that 
the  boundary-condition  cost  is  1  for  the  subarray  of  size  0. 
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The  combination  of  the  bad  split  followed  by  the  good  split  produces  three  sub- 
arrays  of  sizes  0,  (n  —  1 ) / 2  —  1,  and  (n  —  l)/2  at  a  combined  partitioning  cost 
of  0(77)  +  0(77  —  1)  =  0(/7).  Certainly,  this  situation  is  no  worse  than  that  in 
Figure  7.5(b),  namely  a  single  level  of  partitioning  that  produces  two  subarrays  of 
size  (n  —  l)/2,  at  a  cost  of  0(7/).  Yet  this  latter  situation  is  balanced!  Intuitively, 
the  0(77  —  1)  cost  of  the  bad  split  can  be  absorbed  into  the  0(77)  cost  of  the  good 
split,  and  the  resulting  split  is  good.  Thus,  the  running  time  of  quicksort,  when  lev¬ 
els  alternate  between  good  and  bad  splits,  is  like  the  running  time  for  good  splits 
alone:  still  0(n  lg  77),  but  with  a  slightly  larger  constant  hidden  by  the  O -notation. 
We  shall  give  a  rigorous  analysis  of  the  expected  running  time  of  a  randomized 
version  of  quicksort  in  Section  7.4.2. 

Exercises 


7.2-1 

Use  the  substitution  method  to  prove  that  the  recurrence  T(n)  =  T(n  —  1 )  +  0(/7) 
has  the  solution  T(n)  =  0(772),  as  claimed  at  the  beginning  of  Section  7.2. 


7.2-2 

What  is  the  running  time  of  QUICKSORT  when  all  elements  of  array  A  have  the 
same  value? 


7.2-3 

Show  that  the  running  time  of  Quicksort  is  ©(t?2)  when  the  array  A  contains 
distinct  elements  and  is  sorted  in  decreasing  order. 


7.2-4 

Banks  often  record  transactions  on  an  account  in  order  of  the  times  of  the  transac¬ 
tions,  but  many  people  like  to  receive  their  bank  statements  with  checks  listed  in 
order  by  check  number.  People  usually  write  checks  in  order  by  check  number,  and 
merchants  usually  cash  them  with  reasonable  dispatch.  The  problem  of  converting 
time-of-transaction  ordering  to  check-number  ordering  is  therefore  the  problem  of 
sorting  almost-sorted  input.  Argue  that  the  procedure  Insertion-Sort  would 
tend  to  beat  the  procedure  QUICKSORT  on  this  problem. 


7.2-5 

Suppose  that  the  splits  at  every  level  of  quicksort  are  in  the  proportion  1  —  a  to  a, 
where  0<a<l/2isa  constant.  Show  that  the  minimum  depth  of  a  leaf  in  the  re¬ 
cursion  tree  is  approximately  —  lg  77/  lg  a  and  the  maximum  depth  is  approximately 
—  lg  77/  lg(l  —  a).  (Don’t  worry  about  integer  round-off.) 
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7.2-6  * 

Argue  that  for  any  constant  0  <  a  <  1  /2,  the  probability  is  approximately  1—2 a 
that  on  a  random  input  array,  PARTITION  produces  a  split  more  balanced  than  1  —  a 
to  a. 


7.3  A  randomized  version  of  quicksort 

In  exploring  the  average-case  behavior  of  quicksort,  we  have  made  an  assumption 
that  all  permutations  of  the  input  numbers  are  equally  likely.  In  an  engineering 
situation,  however,  we  cannot  always  expect  this  assumption  to  hold.  (See  Exer¬ 
cise  7.2-4.)  As  we  saw  in  Section  5.3,  we  can  sometimes  add  randomization  to  an 
algorithm  in  order  to  obtain  good  expected  performance  over  all  inputs.  Many  peo¬ 
ple  regard  the  resulting  randomized  version  of  quicksort  as  the  sorting  algorithm 
of  choice  for  large  enough  inputs. 

In  Section  5.3,  we  randomized  our  algorithm  by  explicitly  permuting  the  in¬ 
put.  We  could  do  so  for  quicksort  also,  but  a  different  randomization  technique, 
called  random  sampling,  yields  a  simpler  analysis.  Instead  of  always  using  A[r] 
as  the  pivot,  we  will  select  a  randomly  chosen  element  from  the  subarray  A[p  . .  r]. 
We  do  so  by  first  exchanging  element  A  [r]  with  an  element  chosen  at  random 

from  A[p  . .  r\.  By  randomly  sampling  the  range  p . r,  we  ensure  that  the  pivot 

element  x  =  A[r\  is  equally  likely  to  be  any  of  the  r  —  p  +  1  elements  in  the 
subarray.  Because  we  randomly  choose  the  pivot  element,  we  expect  the  split  of 
the  input  array  to  be  reasonably  well  balanced  on  average. 

The  changes  to  PARTITION  and  QUICKSORT  are  small.  In  the  new  partition 
procedure,  we  simply  implement  the  swap  before  actually  partitioning: 

Randomized-Partition  (A,  p ,  r ) 

1  i  =  Random  (/>,  r) 

2  exchange  A[r]  with  A [z] 

3  return  Partition(A,  p,  r) 

The  new  quicksort  calls  Randomized-Partition  in  place  of  Partition: 

Randomized-Quicksort(A,  p ,  r) 

1  if  p  <  r 

2  q  =  Randomized-Partition  (A,  p,  r) 

3  Randomized-Quicksort(A,  p,  q  —  1) 

4  Randomized-Quicksort(A,<7  +  l,r) 

We  analyze  this  algorithm  in  the  next  section. 
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Exercises 


7.3-1 

Why  do  we  analyze  the  expected  running  time  of  a  randomized  algorithm  and  not 
its  worst-case  running  time? 


7.3-2 

When  RANDOMlZED-QuiCKSORT  runs,  how  many  calls  are  made  to  the  random- 
number  generator  Random  in  the  worst  case?  How  about  in  the  best  case?  Give 
your  answer  in  terns  of  0 -notation. 


7.4  Analysis  of  quicksort 

Section  7.2  gave  some  intuition  for  the  worst-case  behavior  of  quicksort  and  for 
why  we  expect  it  to  run  quickly.  In  this  section,  we  analyze  the  behavior  of  quick¬ 
sort  more  rigorously.  We  begin  with  a  worst-case  analysis,  which  applies  to  either 
Quicksort  or  Randomized-Quicksort,  and  conclude  with  an  analysis  of  the 
expected  running  time  of  Randomized-Quicksort. 

7.4.1  Worst-case  analysis 

We  saw  in  Section  7.2  that  a  worst-case  split  at  every  level  of  recursion  in  quicksort 
produces  a  0(/72)  running  time,  which,  intuitively,  is  the  worst-case  running  time 
of  the  algorithm.  We  now  prove  this  assertion. 

Using  the  substitution  method  (see  Section  4.3),  we  can  show  that  the  running 
time  of  quicksort  is  0(n2).  Let  T(n)  be  the  worst-case  time  for  the  procedure 
Quicksort  on  an  input  of  size  n.  We  have  the  recurrence 

T(n)  =  max  ( T(q )  +  T(n  —  q  —  1))  +  &(n)  ,  (7.1) 

0<q<n—l 

where  the  parameter  q  ranges  from  0  to  n  —  1  because  the  procedure  PARTITION 
produces  two  subproblems  with  total  size  n  —  1.  We  guess  that  T(n)  <  cn2  for 
some  constant  c.  Substituting  this  guess  into  recurrence  (7.1),  we  obtain 

T{n)  <  max  (cq2  +  c(n  —  q  —  l)2)  +  0(/j) 

0<q<n-l 

=  c  ■  max  ( q 2  +  ( n  —  q  —  l)2)  +  0(n)  . 

0<q<n—l 

The  expression  q2  +  {n  —  q  —  l)2  achieves  a  maximum  over  the  parameter’s 
range  0  <  q  <  n  —  1  at  either  endpoint.  To  verify  this  claim,  note  that  the  second 
derivative  of  the  expression  with  respect  to  q  is  positive  (see  Exercise  7.4-3).  This 
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observation  gives  us  the  bound  max0<9<„-i(g2  +  (n  —  q  —  l)2)  <  (n  —  l)2  = 
n2  —  2n  +  1.  Continuing  with  our  bounding  of  T(n),  we  obtain 

T(n)  <  cn2  —  c(2n  —  1)  +  0(«) 

<  cn2  , 

since  we  can  pick  the  constant  c  large  enough  so  that  the  c(2n  —  1)  term  dom¬ 
inates  the  0(7?)  term.  Thus,  T(n)  =  0(n2).  We  saw  in  Section  7.2  a  specific 
case  in  which  quicksort  takes  £l(n2)  time:  when  partitioning  is  unbalanced.  Al¬ 
ternatively,  Exercise  7.4-1  asks  you  to  show  that  recurrence  (7.1)  has  a  solution  of 
T(n)  =  Q(n2).  Thus,  the  (worst-case)  running  time  of  quicksort  is  0(//2). 

7.4.2  Expected  running  time 

We  have  already  seen  the  intuition  behind  why  the  expected  running  time  of 
Randomized-Quicksort  is  0(n  Ig  n):  if,  in  each  level  of  recursion,  the  split 
induced  by  Randomized-Partition  puts  any  constant  fraction  of  the  elements 
on  one  side  of  the  partition,  then  the  recursion  tree  has  depth  0 (lg  n),  and  O(n) 
work  is  performed  at  each  level.  Even  if  we  add  a  few  new  levels  with  the  most  un¬ 
balanced  split  possible  between  these  levels,  the  total  time  remains  0(n  lg  n).  We 
can  analyze  the  expected  running  time  of  Randomized-Quicksort  precisely 
by  first  understanding  how  the  partitioning  procedure  operates  and  then  using  this 
understanding  to  derive  an  0{n\gn)  bound  on  the  expected  running  time.  This 
upper  bound  on  the  expected  running  time,  combined  with  the  0(/7  lg//)  best-case 
bound  we  saw  in  Section  7.2,  yields  a  0(/7  lg  n)  expected  running  time.  We  assume 
throughout  that  the  values  of  the  elements  being  sorted  are  distinct. 

Running  time  and  comparisons 

The  Quicksort  and  Randomized-Quicksort  procedures  differ  only  in  how 
they  select  pivot  elements;  they  are  the  same  in  all  other  respects.  We  can  therefore 
couch  our  analysis  of  RANDOMIZED-QUICKSORT  by  discussing  the  QUICKSORT 
and  Partition  procedures,  but  with  the  assumption  that  pivot  elements  are  se¬ 
lected  randomly  from  the  subarray  passed  to  Randomized-Partition. 

The  running  time  of  Quicksort  is  dominated  by  the  time  spent  in  the  Parti¬ 
tion  procedure.  Each  time  the  PARTITION  procedure  is  called,  it  selects  a  pivot 
element,  and  this  element  is  never  included  in  any  future  recursive  calls  to  QUICK¬ 
SORT  and  Partition.  Thus,  there  can  be  at  most  n  calls  to  Partition  over  the 
entire  execution  of  the  quicksort  algorithm.  One  call  to  PARTITION  takes  (9(1) 
time  plus  an  amount  of  time  that  is  proportional  to  the  number  of  iterations  of  the 
for  loop  in  lines  3-6.  Each  iteration  of  this  for  loop  performs  a  comparison  in 
line  4,  comparing  the  pivot  element  to  another  element  of  the  array  A.  Therefore, 
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if  we  can  count  the  total  number  of  times  that  line  4  is  executed,  we  can  bound  the 
total  time  spent  in  the  for  loop  during  the  entire  execution  of  QUICKSORT. 

Lemma  7.1 

Let  X  be  the  number  of  comparisons  performed  in  line  4  of  PARTITION  over  the 
entire  execution  of  QUICKSORT  on  an  n-element  array.  Then  the  running  time  of 
Quicksort  is  0(n  +  X). 

Proof  By  the  discussion  above,  the  algorithm  makes  at  most  n  calls  to  PARTI¬ 
TION,  each  of  which  does  a  constant  amount  of  work  and  then  executes  the  for 
loop  some  number  of  times.  Each  iteration  of  the  for  loop  executes  line  4.  ■ 

Our  goal,  therefore,  is  to  compute  X,  the  total  number  of  comparisons  performed 
in  all  calls  to  PARTITION.  We  will  not  attempt  to  analyze  how  many  comparisons 
are  made  in  each  call  to  PARTITION.  Rather,  we  will  derive  an  overall  bound  on  the 
total  number  of  comparisons.  To  do  so,  we  must  understand  when  the  algorithm 
compares  two  elements  of  the  array  and  when  it  does  not.  For  ease  of  analysis,  we 
rename  the  elements  of  the  array  A  as  Zi,Z2,  ■  ■  ■  >Zn,  with  z,  being  the  z'th  smallest 
element.  We  also  define  the  set  Z;/-  =  {z,,Zi+i,  ■  ■  ■  ,Zj}  to  be  the  set  of  elements 
between  Zi  and  z.j,  inclusive. 

When  does  the  algorithm  compare  Zi  and  Zj ?  To  answer  this  question,  we  first 
observe  that  each  pair  of  elements  is  compared  at  most  once.  Why?  Elements 
are  compared  only  to  the  pivot  element  and,  after  a  particular  call  of  PARTITION 
finishes,  the  pivot  element  used  in  that  call  is  never  again  compared  to  any  other 
elements. 

Our  analysis  uses  indicator  random  variables  (see  Section  5.2).  We  define 
Xjj  =  I  is  compared  to  Zj)  , 

where  we  are  considering  whether  the  comparison  takes  place  at  any  time  during 
the  execution  of  the  algorithm,  not  just  during  one  iteration  or  one  call  of  PARTI¬ 
TION.  Since  each  pair  is  compared  at  most  once,  we  can  easily  characterize  the 
total  number  of  comparisons  performed  by  the  algorithm: 


n— 1  n 


Taking  expectations  of  both  sides,  and  then  using  linearity  of  expectation  and 
Lemma  5.1,  we  obtain 


n— 1  n 


E[X]  =  E  Y,  E  X*J 


_i=l 7=1+1 
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n— 1  n 

=  E  E 

i  =  1  7=1  +  1 
«— 1  « 

=  EE  Pr  {it  is  compared  to  Zj)  .  (7.2) 

i=l j=i+ i 

It  remains  to  compute  Pr {zi  is  compai'ed  to  Zj}-  Our  analysis  assumes  that  the 
Randomized-Partition  procedure  chooses  each  pivot  randomly  and  indepen¬ 
dently. 

Let  us  think  about  when  two  items  are  not  compared.  Consider  an  input  to 
quicksort  of  the  numbers  1  through  10  (in  any  order),  and  suppose  that  the  first 
pivot  element  is  7.  Then  the  first  call  to  PARTITION  separates  the  numbers  into  two 
sets:  {1,2,  3, 4,  5,  6}  and  {8,  9, 10}.  In  doing  so,  the  pivot  element  7  is  compai'ed 
to  all  other  elements,  but  no  number  from  the  first  set  (e.g.,  2)  is  or  ever  will  be 
compared  to  any  number  from  the  second  set  (e.g.,  9). 

In  general,  because  we  assume  that  element  values  are  distinct,  once  a  pivot  x 
is  chosen  with  Zi  <  x  <  Zj,  we  know  that  Zi  and  z,j  cannot  be  compared  at  any 
subsequent  time.  If,  on  the  other  hand,  Zi  is  chosen  as  a  pivot  before  any  other  item 
in  Zy,  then  Zi  will  be  compared  to  each  item  in  Zy ,  except  for  itself.  Similarly, 
if  Zj  is  chosen  as  a  pivot  before  any  other  item  in  Zy ,  then  Zj  will  be  compared  to 
each  item  in  Zy ,  except  for  itself.  In  our  example,  the  values  7  and  9  are  compared 
because  7  is  the  first  item  from  Z7;9  to  be  chosen  as  a  pivot.  In  contrast,  2  and  9  will 
never  be  compared  because  the  first  pivot  element  chosen  from  Z2.g  is  7.  Thus,  z.i 
and  Zj  are  compared  if  and  only  if  the  first  element  to  be  chosen  as  a  pivot  from  Zy 
is  either  z.t  or  Zj. 

We  now  compute  the  probability  that  this  event  occurs.  Prior  to  the  point  at 
which  an  element  from  Zy  has  been  chosen  as  a  pivot,  the  whole  set  Zy  is  together 
in  the  same  partition.  Therefore,  any  element  of  Zy  is  equally  likely  to  be  the  first 
one  chosen  as  a  pivot.  Because  the  set  Zy  has  j—i  +  l  elements,  and  because  pivots 
are  chosen  randomly  and  independently,  the  probability  that  any  given  element  is 
the  first  one  chosen  as  a  pivot  is  l/(J  —  i  +  1).  Thus,  we  have 

Pr  {z,  is  compared  to  Zj  }  = 


Pr  {z.i  or  Zj  is  first  pivot  chosen  from  Zy} 
Pr  {z.i  is  first  pivot  chosen  from  Zy} 

+  Pr  { z,j  is  first  pivot  chosen  from  Zy  } 

1  1 

+ 


./ 


■  i  +  1 
2 


j  ~  i  +  1 


7-1  +  1 


(7.3) 
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The  second  line  follows  because  the  two  events  are  mutually  exclusive.  Combining 
equations  (7.2)  and  (7.3),  we  get  that 


n— 1  n 

e[*i  =  E  e 


;=1 j=i+l 


2 

j  ~  i  +  1  ' 


We  can  evaluate  this  sum  using  a  change  of  variables  (k  =  j  —  i  )  and  the  bound 
on  the  harmonic  series  in  equation  (A.7): 


E[X] 


n— 1  n 


EE 


2 

j  ~  i  +  1 


n— 1  n—i 


EE 


2 

k  +  1 


< 


n- 1  n  ~ 

EEr 

i  =  l  k= 1 


«— 1 


;  =  1 

0(/!  lg/l)  . 


(7.4) 


Thus  we  conclude  that,  using  Randomized-Partition,  the  expected  running 
time  of  quicksort  is  0(n  Ig  /?)  when  element  values  are  distinct. 


Exercises 


7.4- 1 

Show  that  in  the  recurrence 

T(n)  =  max  ( T(q )  +  T(n  —  q  —  1))  +  &(n )  , 

0<q<n— 1 

T{n)  =  Q(n2). 

7.4- 2 

Show  that  quicksort’s  best-case  running  time  is  £l(n  lg  n). 

7.4- 3 

Show  that  the  expression  q2  +  (n  —  q  —  l)2  achieves  a  maximum  over  q  — 
0,  1 , . . . ,  n  —  1  when  q  =  0  or  q  =  n  —  1 . 

7.4- 4 

Show  that  Randomized-Quicksort’s  expected  running  time  is  £2(/i  lg/z)- 
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7.4- 5 

We  can  improve  the  running  time  of  quicksort  in  practice  by  taking  advantage  of  the 
fast  running  time  of  insertion  sort  when  its  input  is  “nearly”  sorted.  Upon  calling 
quicksort  on  a  subarray  with  fewer  than  k  elements,  let  it  simply  return  without 
sorting  the  subarray.  After  the  top-level  call  to  quicksort  returns,  run  insertion  sort 
on  the  entire  array  to  finish  the  sorting  process.  Argue  that  this  sorting  algorithm 
runs  in  0{nk  +  n  lg (n/ k))  expected  time.  How  should  we  pick  k,  both  in  theory 
and  in  practice? 

7.4- 6  * 

Consider  modifying  the  PARTITION  procedure  by  randomly  picking  three  elements 
from  array  A  and  partitioning  about  their  median  (the  middle  value  of  the  three 
elements).  Approximate  the  probability  of  getting  at  worst  an  ar-to-(l  —  a)  split,  as 
a  function  of  a  in  the  range  0  <  a  <  1 . 


Problems 


7-1  Hoare  partition  correctness 

The  version  of  PARTITION  given  in  this  chapter  is  not  the  original  partitioning 
algorithm.  Here  is  the  original  partition  algorithm,  which  is  due  to  C.  A.  R.  Hoare: 

Ho are-Partition {A.  p,  r) 

1  x  =  A[p] 

2  i  =  p  —  1 

3  j  =r  +  1 

4  while  true 

5  repeat 

6  j  =  j  ~  1 

7  until  A  [j  ]  <  x 

8  repeat 

9  i'  =  i'  +  l 

10  until  A  [/]  >  x 

11  if  i  <  j 

1 2  exchange  A  [i  ]  with  A  [j  ] 

13  else  return  j 

a.  Demonstrate  the  operation  of  Hoare-Partition  on  the  array  A  =  (13, 19,  9, 
5,  12,  8,  7,  4,  11, 2,  6,  21),  showing  the  values  of  the  array  and  auxiliary  values 
after  each  iteration  of  the  while  loop  in  lines  4-13. 
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The  next  three  questions  ask  you  to  give  a  careful  argument  that  the  procedure 
Hoare-Partition  is  correct.  Assuming  that  the  subarray  A[p  . .  r]  contains  at 
least  two  elements,  prove  the  following: 

b.  The  indices  i  and  j  are  such  that  we  never  access  an  element  of  A  outside  the 
subarray  A[p  . .  r], 

c.  When  Hoare-Partition  terminates,  it  returns  a  value  j  such  that  p  <j<r. 

d.  Every  element  of  A[p  . .  j  ]  is  less  than  or  equal  to  every  element  of  A  [j  +  I  . .  r] 
when  Hoare-Partition  terminates. 

The  Partition  procedure  in  Section  7.1  separates  the  pivot  value  (originally 
in  A[r])  from  the  two  partitions  it  forms.  The  Hoare-Partition  procedure,  on 
the  other  hand,  always  places  the  pivot  value  (originally  in  A[p\)  into  one  of  the 
two  partitions  A[p  . .  j ]  and  A  [j  +  1 . .  r].  Since  p  <  j  <r,  this  split  is  always 
nontrivial. 

e.  Rewrite  the  Quicksort  procedure  to  use  Hoare-Partition. 

7-2  Quicksort  with  equal  element  values 

The  analysis  of  the  expected  running  time  of  randomized  quicksort  in  Section  7.4.2 
assumes  that  all  element  values  are  distinct.  In  this  problem,  we  examine  what 
happens  when  they  are  not. 

a.  Suppose  that  all  element  values  are  equal.  What  would  be  randomized  quick¬ 
sort’s  running  time  in  this  case? 

b.  The  Partition  procedure  returns  an  index  q  such  that  each  element  of 
A\p  . .  q  —  1]  is  less  than  or  equal  to  A[q\  and  each  element  of  A[q  +  1 . .  r] 
is  greater  than  A  [q\.  Modify  the  PARTITION  procedure  to  produce  a  procedure 
Partition^ A,  p ,  r),  which  permutes  the  elements  of  A[p  . .  r\  and  returns  two 
indices  q  and  t,  where  p  <  q  <  t  <  r ,  such  that 

•  all  elements  of  A[q  .  .t]  are  equal, 

•  each  element  of  A[p  . .  q  —  1]  is  less  than  A[q\,  and 

•  each  element  of  A[t  +  1  . .  r]  is  greater  than  A [q]. 

Like  Partition,  your  Partition'  procedure  should  take  0(r  —  p)  time. 

c.  Modify  the  Randomized-Quicksort  procedure  to  call  Partition',  and 
name  the  new  procedure  Randomized-Quicksort'.  Then  modify  the 
Quicksort  procedure  to  produce  a  procedure  Quicksort'(/>,  r)  that  calls 
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Randomized-Partition'  and  recurses  only  on  partitions  of  elements  not 
known  to  be  equal  to  each  other. 

d.  Using  Quicksort',  how  would  you  adjust  the  analysis  in  Section  7.4.2  to 
avoid  the  assumption  that  all  elements  are  distinct? 


7-3  Alternative  quicksort  analysis 

An  alternative  analysis  of  the  running  time  of  randomized  quicksort  focuses  on 
the  expected  running  time  of  each  individual  recursive  call  to  Randomized- 
QuiCKSORT,  rather  than  on  the  number  of  comparisons  performed. 

a.  Argue  that,  given  an  array  of  size  n,  the  probability  that  any  particular-  element 
is  chosen  as  the  pivot  is  \/n.  Use  this  to  define  indicator  random  variables 
Xj  =  I  { i th  smallest  element  is  chosen  as  the  pivot}.  What  is  E  [A,]? 

b.  Let  T  (n)  be  a  random  variable  denoting  the  running  time  of  quicksort  on  an 
array  of  size  n.  Argue  that 


E[7»]  =  E 


£  Xg  (T(q-l)  +  T(n  -?)  +  ©(«)) 


-9=1 


c.  Show  that  we  can  rewrite  equation  (7.5)  as 


E[T(n)] 


9  n~1 

-Veto]  +  ©(«). 

n 


(7.5) 


(7.6) 


d.  Show  that 


1  1 

^  k  lg  k  <  -n2  lg  n  -  -n2  .  ( 7.7 ) 

k= 2 

{Hint:  Split  the  summation  into  two  parts,  one  for  k  =  2,  3 . fzz/2”|  —  1  and 

one  for  k  =  \n/2] . n  —  1 .) 

e.  Using  the  bound  from  equation  (7.7),  show  that  the  recurrence  in  equation  (7.6) 
has  the  solution  E  [7" (/7 )]  =  &{n  lg  n).  {Hint:  Show,  by  substitution,  that 
E  [T{n)\  <  an  lg  n  for  sufficiently  large  n  and  for  some  positive  constant  a.) 
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7-4  Stack  depth  for  quicksort 

The  Quicksort  algorithm  of  Section  7.1  contains  two  recursive  calls  to  itself. 
After  Quicksort  calls  Partition,  it  recursively  sorts  the  left  subarray  and  then 
it  recursively  sorts  the  right  subarray.  The  second  recursive  call  in  QUICKSORT 
is  not  really  necessary;  we  can  avoid  it  by  using  an  iterative  control  structure. 
This  technique,  called  tail  recursion,  is  provided  automatically  by  good  compilers. 
Consider  the  following  version  of  quicksort,  which  simulates  tail  recursion: 

Tail-Recursive-QuicksortQ4,  p,  r) 

1  while  p  <  r 

2  II  Partition  and  sort  left  subarray. 

3  q  =  Partition ( A,  p,  r) 

4  Tail-Recursive-Quicksort ( A,  p,  q  -  1) 

5  p  =  q  +  1 

a.  Argue  that  Tail-Recursive-Quicksort  {A,  1,  A.  length)  correctly  sorts  the 
array  A. 

Compilers  usually  execute  recursive  procedures  by  using  a  stack  that  contains  per¬ 
tinent  information,  including  the  parameter  values,  for  each  recursive  call.  The 
information  for  the  most  recent  call  is  at  the  top  of  the  stack,  and  the  information 
for  the  initial  call  is  at  the  bottom.  Upon  calling  a  procedure,  its  information  is 
pushed  onto  the  stack;  when  it  terminates,  its  information  is  popped.  Since  we 
assume  that  array  parameters  are  represented  by  pointers,  the  information  for  each 
procedure  call  on  the  stack  requires  0(  1)  stack  space.  The  stack  depth  is  the  max¬ 
imum  amount  of  stack  space  used  at  any  time  during  a  computation. 

b.  Describe  a  scenario  in  which  Tail-Recursive-Quicksort’s  stack  depth  is 
©(«)  on  an  n  -element  input  array. 

c.  Modify  the  code  for  Tail-Recursive-Quicksort  so  that  the  worst-case 
stack  depth  is  0 ( 1  g n).  Maintain  the  (9(7?  Ig n)  expected  running  time  of  the 
algorithm. 

7-5  Median-of-3  partition 

One  way  to  improve  the  Randomized-Quicksort  procedure  is  to  partition 
around  a  pivot  that  is  chosen  more  care  fully  than  by  picking  a  random  element 
from  the  subarray.  One  common  approach  is  the  median-of-3  method:  choose 
the  pivot  as  the  median  (middle  element)  of  a  set  of  3  elements  randomly  selected 
from  the  subarray.  (See  Exercise  7.4-6.)  For  this  problem,  let  us  assume  that  the 
elements  in  the  input  array  ,4 [I  . .  n]  are  distinct  and  that  n  >  3.  We  denote  the 
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sorted  output  array  by  A'[\  .  ,n\.  Using  the  median-of-3  method  to  choose  the 
pivot  element  x,  define  pt  =  Pr  {x  =  A'[z']}. 

a.  Give  an  exact  formula  for  p,  as  a  function  of  n  and  i  for  i  =  2,3, ...  ,n  —  I . 
(Note  that  px  =  pn  =  0.) 

b.  By  what  amount  have  we  increased  the  likelihood  of  choosing  the  pivot  as 
x  =  A'[[(n  +  l)/2j],  the  median  of  A[\  ,.n\,  compared  with  the  ordinary 
implementation?  Assume  that  n  — >  oo,  and  give  the  limiting  ratio  of  these 
probabilities. 

c.  If  we  define  a  “good”  split  to  mean  choosing  the  pivot  as  x  =  A'[i\,  where 
n/3  <  i  <  2n/3,  by  what  amount  have  we  increased  the  likelihood  of  getting 
a  good  split  compared  with  the  ordinary  implementation?  (Hint:  Approximate 
the  sum  by  an  integral.) 

d.  Argue  that  in  the  f?(/z  lg  n)  running  time  of  quicksort,  the  median-of-3  method 
affects  only  the  constant  factor. 

7-6  Fuzzy  sorting  of  intervals 

Consider  a  sorting  problem  in  which  we  do  not  know  the  numbers  exactly.  In¬ 
stead,  for  each  number,  we  know  an  interval  on  the  real  line  to  which  it  belongs. 
That  is,  we  are  given  n  closed  intervals  of  the  form  [a,  ,  b{\,  where  a,-  <  b, .  We 
wish  to  fuzzy-sort  these  intervals,  i.e.,  to  produce  a  permutation  (z i,  i2,  . . . ,  in)  of 
the  intervals  such  that  for  j  =  1,2....,//,  there  exist  Cj  €  [fl,- ■,/>,■•]  satisfying 
Cl  <  c2  <  <  cn. 

a.  Design  a  randomized  algorithm  for  fuzzy-sorting  n  intervals.  Your  algorithm 
should  have  the  general  structure  of  an  algorithm  that  quicksorts  the  left  end¬ 
points  (the  a j  values),  but  it  should  take  advantage  of  overlapping  intervals  to 
improve  the  running  time.  (As  the  intervals  overlap  more  and  more,  the  prob¬ 
lem  of  fuzzy-sorting  the  intervals  becomes  progressively  easier.  Your  algorithm 
should  take  advantage  of  such  overlapping,  to  the  extent  that  it  exists.) 

b.  Argue  that  your  algorithm  runs  in  expected  time  0(zz  lg  n)  in  general,  but  runs 
in  expected  time  0(/z)  when  all  of  the  intervals  overlap  (i.e.,  when  there  exists  a 
value  x  such  that  x  e  [tf,  //,]  for  all  z).  Your  algorithm  should  not  be  checking 
for  this  case  explicitly;  rather,  its  performance  should  naturally  improve  as  the 
amount  of  overlap  increases. 
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Chapter  notes 

The  quicksort  procedure  was  invented  by  Hoare  [170];  Hoare’s  version  appears  in 
Problem  7-1.  The  PARTITION  procedure  given  in  Section  7.1  is  due  to  N.  Lomuto. 
The  analysis  in  Section  7.4  is  due  to  Avrim  Blum.  Sedgewick  [305]  and  Bent¬ 
ley  [43]  provide  a  good  reference  on  the  details  of  implementation  and  how  they 
matter. 

Mcllroy  [248]  showed  how  to  engineer  a  “killer  adversary”  that  produces  an 
array  on  which  virtually  any  implementation  of  quicksort  takes  Q(7?2)  time.  If  the 
implementation  is  randomized,  the  adversary  produces  the  array  after  seeing  the 
random  choices  of  the  quicksort  algorithm. 
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We  have  now  introduced  several  algorithms  that  can  sort  n  numbers  in  0(n  Ig  /?  ) 
time.  Merge  sort  and  heapsort  achieve  this  upper  bound  in  the  worst  case;  quicksort 
achieves  it  on  average.  Moreover,  for  each  of  these  algorithms,  we  can  produce  a 
sequence  of  n  input  numbers  that  causes  the  algorithm  to  run  in  £l(n  lg  n)  time. 

These  algorithms  share  an  interesting  property:  the  sorted  order  they  determine 
is  based  only  on  comparisons  between  the  input  elements.  We  call  such  sorting 
algorithms  comparison  sorts.  All  the  sorting  algorithms  introduced  thus  far  are 
comparison  sorts. 

In  Section  8.1,  we  shall  prove  that  any  comparison  sort  must  make  Q.(n\gn) 
comparisons  in  the  worst  case  to  sort  n  elements.  Thus,  merge  sort  and  heapsort 
are  asymptotically  optimal,  and  no  comparison  sort  exists  that  is  faster  by  more 
than  a  constant  factor. 

Sections  8.2,  8.3,  and  8.4  examine  three  sorting  algorithms— counting  sort,  radix 
sort,  and  bucket  sort— that  run  in  linear  time.  Of  course,  these  algorithms  use 
operations  other  than  comparisons  to  determine  the  sorted  order.  Consequently, 
the  £l{n  lg  n)  lower  bound  does  not  apply  to  them. 


8.1  Lower  bounds  for  sorting 

In  a  comparison  sort,  we  use  only  comparisons  between  elements  to  gain  order 
information  about  an  input  sequence  (a\,  a2, . .  ■ ,  an).  That  is,  given  two  elements 
at  and  aj,  we  perform  one  of  the  tests  <3,  <  aj,  at  <  aj,  at  =  aj,  a,  >  aj,  or 
at  >  aj  to  determine  their  relative  order.  We  may  not  inspect  the  values  of  the 
elements  or  gain  order  information  about  them  in  any  other  way. 

In  this  section,  we  assume  without  loss  of  generality  that  all  the  input  elements 
are  distinct.  Given  this  assumption,  comparisons  of  the  form  a,  =  a,-  are  useless, 
so  we  can  assume  that  no  comparisons  of  this  form  are  made.  We  also  note  that 
the  comparisons  a,  <  aj,  a,  >  aj ,  a ,  >  aj,  and  a,  <  aj  are  all  equivalent  in  that 


192 


Chapter  8  Sorting  in  Linear  Time 


Figure  8.1  The  decision  tree  for  insertion  sort  operating  on  three  elements.  An  internal  node  an 
notated  by  i :  j  indicates  a  comparison  between  a,-  and  ay .  A  leaf  annotated  by  the  permutation 
{tt(1),  it (2),  ....  tt(n))  indicates  the  ordering  a,(i)  <  a„(2)  —  —  an («)•  The  shaded  path 

indicates  the  decisions  made  when  sorting  the  input  sequence  (ai  =  6,  02  =  8,  a 3  =  5);  the 
permutation  (3, 1,2)  at  the  leaf  indicates  that  the  sorted  ordering  is  03  =  5  <  a\  =  6  <  a2  =  8. 
There  are  3!  =  6  possible  permutations  of  the  input  elements,  and  so  the  decision  tree  must  have  at 
least  6  leaves. 

they  yield  identical  information  about  the  relative  order  of  a(  and  ay.  We  therefore 
assume  that  all  comparisons  have  the  form  a,-  <  ay. 

The  decision-tree  model 

We  can  view  comparison  sorts  abstractly  in  terms  of  decision  trees.  A  decision 
tree  is  a  full  binary  tree  that  represents  the  comparisons  between  elements  that 
are  performed  by  a  particular  sorting  algorithm  operating  on  an  input  of  a  given 
size.  Control,  data  movement,  and  all  other  aspects  of  the  algorithm  are  ignored. 
Figure  8.1  shows  the  decision  tree  corresponding  to  the  insertion  sort  algorithm 
from  Section  2. 1  operating  on  an  input  sequence  of  three  elements. 

In  a  decision  tree,  we  annotate  each  internal  node  by  i :  j  for  some  i  and  j  in  the 
range  1  <  i,j  <  n,  where  n  is  the  number  of  elements  in  the  input  sequence.  We 
also  annotate  each  leaf  by  a  permutation  (tt(1),  7t(2),  . . . ,  n(n)).  (See  Section  C.  1 
for  background  on  permutations.)  The  execution  of  the  sorting  algorithm  corre¬ 
sponds  to  tracing  a  simple  path  from  the  root  of  the  decision  tree  down  to  a  leaf. 
Each  internal  node  indicates  a  comparison  a,  <  ay.  The  left  subtree  then  dictates 
subsequent  comparisons  once  we  know  that  a,  <  ay,  and  the  right  subtree  dictates 
subsequent  comparisons  knowing  that  a,  >  ay.  When  we  come  to  a  leaf,  the  sort¬ 
ing  algorithm  has  established  the  ordering  a„(1)  <  a„(2)  <  •••  <  a^(n).  Because 
any  correct  sorting  algorithm  must  be  able  to  produce  each  permutation  of  its  input, 
each  of  the  n !  permutations  on  n  elements  must  appear  as  one  of  the  leaves  of  the 
decision  tree  for  a  comparison  sort  to  be  correct.  Furthermore,  each  of  these  leaves 
must  be  reachable  from  the  root  by  a  downward  path  corresponding  to  an  actual 
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execution  of  the  comparison  sort.  (We  shall  refer  to  such  leaves  as  “reachable.”) 
Thus,  we  shall  consider  only  decision  trees  in  which  each  permutation  appeal's  as 
a  reachable  leaf. 

A  lower  bound  for  the  worst  case 

The  length  of  the  longest  simple  path  from  the  root  of  a  decision  tree  to  any  of 
its  reachable  leaves  represents  the  worst-case  number  of  comparisons  that  the  cor¬ 
responding  sorting  algorithm  performs.  Consequently,  the  worst-case  number  of 
comparisons  for  a  given  comparison  sort  algorithm  equals  the  height  of  its  decision 
tree.  A  lower  bound  on  the  heights  of  all  decision  trees  in  which  each  permutation 
appears  as  a  reachable  leaf  is  therefore  a  lower  bound  on  the  running  time  of  any 
comparison  sort  algorithm.  The  following  theorem  establishes  such  a  lower  bound. 

Theorem  8.1 

Any  comparison  sort  algorithm  requires  Q(n  lg  n)  comparisons  in  the  worst  case. 

Proof  From  the  preceding  discussion,  it  suffices  to  determine  the  height  of  a 
decision  tree  in  which  each  permutation  appears  as  a  reachable  leaf.  Consider  a 
decision  tree  of  height  /?  with  /  reachable  leaves  corresponding  to  a  comparison 
sort  on  n  elements.  Because  each  of  the  n !  permutations  of  the  input  appears  as 
some  leaf,  we  have  n !  <  / .  Since  a  binary  tree  of  height  h  has  no  more  than  2h 
leaves,  we  have 

nl  <  l  <  2h  , 

which,  by  taking  logarithms,  implies 

h  >  lg(n !)  (since  the  lg  function  is  monotonically  increasing) 

=  Q.(n\gn)  (by  equation  (3.19))  .  ■ 


Corollary  8.2 

Heapsort  and  merge  sort  are  asymptotically  optimal  comparison  sorts. 

Proof  The  0(n  lg  n)  upper  bounds  on  the  running  times  for  heapsort  and  merge 
sort  match  the  £2(/j  lg  n)  worst-case  lower  bound  from  Theorem  8.1.  ■ 

Exercises 


8.1-1 

What  is  the  smallest  possible  depth  of  a  leaf  in  a  decision  tree  for  a  comparison 
sort? 
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8.1-2 

Obtain  asymptotically  tight  bounds  on  lg(«!)  without  using  Stirling’s  approxi¬ 
mation.  Instead,  evaluate  the  summation  Yl'l= using  techniques  from  Sec¬ 
tion  A.2. 


8.1-3 

Show  that  there  is  no  comparison  sort  whose  running  time  is  linear  for  at  least  half 
of  the  n !  inputs  of  length  n .  What  about  a  fraction  of  1  /n  of  the  inputs  of  length  n  ? 
What  about  a  fraction  1/2"? 


8.1-4 

Suppose  that  you  are  given  a  sequence  of  n  elements  to  sort.  The  input  sequence 
consists  of  »/  k  subsequences,  each  containing  k  elements.  The  elements  in  a  given 
subsequence  are  all  smaller  than  the  elements  in  the  succeeding  subsequence  and 
larger  than  the  elements  in  the  preceding  subsequence.  Thus,  all  that  is  needed  to 
sort  the  whole  sequence  of  length  n  is  to  sort  the  k  elements  in  each  of  the  n/k 
subsequences.  Show  an  Q(nlgk)  lower  bound  on  the  number  of  comparisons 
needed  to  solve  this  variant  of  the  sorting  problem.  {Hint:  It  is  not  rigorous  to 
simply  combine  the  lower  bounds  for  the  individual  subsequences.) 


8.2  Counting  sort 

Counting  sort  assumes  that  each  of  the  n  input  elements  is  an  integer  in  the  range 
0  to  k,  for  some  integer  k.  When  k  =  0{n),  the  sort  runs  in  0(n)  time. 

Counting  sort  determines,  for  each  input  element  x,  the  number  of  elements  less 
than  x.  It  uses  this  information  to  place  element  x  directly  into  its  position  in  the 
output  array.  For  example,  if  17  elements  are  less  than  x,  then  x  belongs  in  output 
position  18.  We  must  modify  this  scheme  slightly  to  handle  the  situation  in  which 
several  elements  have  the  same  value,  since  we  do  not  want  to  put  them  ah  in  the 
same  position. 

In  the  code  for  counting  sort,  we  assume  that  the  input  is  an  array  A[\  .  ,n\,  and 
thus  A. length  =  n.  We  require  two  other  arrays:  the  array  B [ I  . .  n\  holds  the 
sorted  output,  and  the  array  C  [0  .  .k]  provides  temporary  working  storage. 
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Figure  8.2  The  operation  of  COUNTING  SORT  on  an  input  array  A[\  . .  8],  where  each  element 
of  A  is  a  nonnegative  integer  no  larger  than  k  =  5.  (a)  The  array  A  and  the  auxiliary  array  C  after 
line  5.  (b)  The  array  C  after  line  8.  (c)  (e)  The  output  array  B  and  the  auxiliary  array  C  after  one, 
two,  and  three  iterations  of  the  loop  in  lines  10  12,  respectively.  Only  the  lightly  shaded  elements  of 
array  B  have  been  filled  in.  (f)  The  final  sorted  output  array  B . 


Counting -Sort  (A,  B ,  k) 

1  let  C  [0 . .  k]  be  a  new  array 

2  for  i  =  0  to  k 

3  C[i]  =  0 

4  for  j  =  I  to  A. length 

5  C[A[j}\  =  C[A[j]]  +  l 

6  II  C[i]  now  contains  the  number  of  elements  equal  to  i. 

7  for  i  =  1  to  /c 

8  C[i]  =  C[i]  +  C[i-  1] 

9  II  C  [/]  now  contains  the  number  of  elements  less  than  or  equal  to  i . 

10  for  j  =  A. length  downto  1 

11  B[C[A[j]]]  =  A[j } 

12  C[A[j]]  =  C[A[j]]-\ 

Figure  8.2  illustrates  counting  sort.  After  the  for  loop  of  lines  2-3  initializes  the 
array  C  to  all  zeros,  the  for  loop  of  lines  4-5  inspects  each  input  element.  If  the 
value  of  an  input  element  is  i,  we  increment  C[i].  Thus,  after  line  5,  C[z]  holds 

the  number  of  input  elements  equal  to  i  for  each  integer  i  —  0,  1 . k.  Lines  7-8 

determine  for  each  i  =  0, 1, . . . ,  k  how  many  input  elements  are  less  than  or  equal 
to  i  by  keeping  a  running  sum  of  the  array  C . 
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Finally,  the  for  loop  of  lines  10-12  places  each  element  A[j\  into  its  correct 
sorted  position  in  the  output  array  B .  If  all  n  elements  are  distinct,  then  when  we 
first  enter  line  10,  for  each  A[j\,  the  value  C  [A  [/ ]]  is  the  correct  final  position 
of  A[j\  in  the  output  array,  since  there  are  C  [A  [/ ]]  elements  less  than  or  equal 
to  A[j}.  Because  the  elements  might  not  be  distinct,  we  decrement  C [A [/]]  each 
time  we  place  a  value  A[j]  into  the  B  array.  Decrementing  C [A [_/]]  causes  the 
next  input  element  with  a  value  equal  to  A[j\,  if  one  exists,  to  go  to  the  position 
immediately  before  A  [j  ]  in  the  output  array. 

How  much  time  does  counting  sort  require?  The  for  loop  of  lines  2-3  takes 
time  ®(k),  the  for  loop  of  lines  4-5  takes  time  ©(/?),  the  for  loop  of  lines  7-8  takes 
time  0(fc),  and  the  for  loop  of  lines  10-12  takes  time  ©(/?).  Thus,  the  overall  time 
is  ®(k  +  n).  In  practice,  we  usually  use  counting  sort  when  we  have  k  =  0(n),  in 
which  case  the  running  time  is  ©(/?). 

Counting  sort  beats  the  lower  bound  of  £?(/?  lg  n)  proved  in  Section  8.1  because 
it  is  not  a  comparison  sort.  In  fact,  no  comparisons  between  input  elements  occur 
anywhere  in  the  code.  Instead,  counting  sort  uses  the  actual  values  of  the  elements 
to  index  into  an  array.  The  T2 (/?  lg/?)  lower  bound  for  sorting  does  not  apply  when 
we  depart  from  the  comparison  sort  model. 

An  important  property  of  counting  sort  is  that  it  is  stable :  numbers  with  the  same 
value  appear  in  the  output  array  in  the  same  order  as  they  do  in  the  input  array.  That 
is,  it  breaks  ties  between  two  numbers  by  the  rule  that  whichever  number  appears 
first  in  the  input  array  appeal's  first  in  the  output  array.  Normally,  the  property  of 
stability  is  important  only  when  satellite  data  are  carried  around  with  the  element 
being  sorted.  Counting  sort’s  stability  is  important  for  another  reason:  counting 
sort  is  often  used  as  a  subroutine  in  radix  sort.  As  we  shall  see  in  the  next  section, 
in  order  for  radix  sort  to  work  correctly,  counting  sort  must  be  stable. 

Exercises 


8.2-1 

Using  Figure  8.2  as  a  model,  illustrate  the  operation  of  COUNTING-SORT  on  the 
array  A  =  (6, 0,  2, 0,  1,  3,  4,  6, 1, 3, 2). 


8.2-2 

Prove  that  Counting-Sort  is  stable. 


8.2-3 

Suppose  that  we  were  to  rewrite  the  for  loop  header  in  line  10  of  the  COUNTING- 
SORT  as 

10  for  j  =  1  to  A. length 

Show  that  the  algorithm  still  works  properly.  Is  the  modified  algorithm  stable? 
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8.2-4 

Describe  an  algorithm  that,  given  n  integers  in  the  range  0  to  k,  preprocesses  its 
input  and  then  answers  any  query  about  how  many  of  the  n  integers  fall  into  a 
range  [a  .  .b]  in  0(1)  time.  Your  algorithm  should  use  0(/?  +  k )  preprocessing 
time. 


8.3  Radix  sort 

Radix  sort  is  the  algorithm  used  by  the  card-sorting  machines  you  now  find  only  in 
computer  museums.  The  cards  have  80  columns,  and  in  each  column  a  machine  can 
punch  a  hole  in  one  of  12  places.  The  sorter  can  be  mechanically  “programmed” 
to  examine  a  given  column  of  each  card  in  a  deck  and  distribute  the  card  into  one 
of  12  bins  depending  on  which  place  has  been  punched.  An  operator  can  then 
gather  the  cards  bin  by  bin,  so  that  cards  with  the  first  place  punched  are  on  top  of 
cards  with  the  second  place  punched,  and  so  on. 

For  decimal  digits,  each  column  uses  only  10  places.  (The  other  two  places 
are  reserved  for  encoding  nonnumeric  characters.)  A  d -digit  number  would  then 
occupy  a  field  of  d  columns.  Since  the  card  sorter  can  look  at  only  one  column 
at  a  time,  the  problem  of  sorting  n  cards  on  a  d -digit  number  requires  a  sorting 
algorithm. 

Intuitively,  you  might  sort  numbers  on  their  most  significant  digit,  sort  each  of 
the  resulting  bins  recursively,  and  then  combine  the  decks  in  order.  Unfortunately, 
since  the  cards  in  9  of  the  10  bins  must  be  put  aside  to  sort  each  of  the  bins,  this 
procedure  generates  many  intermediate  piles  of  cards  that  you  would  have  to  keep 
track  of.  (See  Exercise  8.3-5.) 

Radix  sort  solves  the  problem  of  card  sorting— counterintuitively— by  sorting  on 
the  least  significant  digit  first.  The  algorithm  then  combines  the  cards  into  a  single 
deck,  with  the  cards  in  the  0  bin  preceding  the  cards  in  the  1  bin  preceding  the 
cards  in  the  2  bin,  and  so  on.  Then  it  sorts  the  entire  deck  again  on  the  second-least 
significant  digit  and  recombines  the  deck  in  a  like  manner.  The  process  continues 
until  the  cards  have  been  sorted  on  all  cl  digits.  Remarkably,  at  that  point  the  cards 
are  fully  sorted  on  the  d -digit  number.  Thus,  only  d  passes  through  the  deck  are 
required  to  sort.  Figure  8.3  shows  how  radix  sort  operates  on  a  “deck”  of  seven 
3-digit  numbers. 

In  order  for  radix  sort  to  work  correctly,  the  digit  sorts  must  be  stable.  The  sort 
performed  by  a  card  sorter  is  stable,  but  the  operator  has  to  be  wary  about  not 
changing  the  order  of  the  cards  as  they  come  out  of  a  bin,  even  though  all  the  cards 
in  a  bin  have  the  same  digit  in  the  chosen  column. 
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Figure  8.3  The  operation  of  radix  sort  on  a  list  of  seven  3  digit  numbers.  The  leftmost  column  is 
the  input.  The  remaining  columns  show  the  list  after  successive  sorts  on  increasingly  significant  digit 
positions.  Shading  indicates  the  digit  position  sorted  on  to  produce  each  list  from  the  previous  one. 


In  a  typical  computer,  which  is  a  sequential  random-access  machine,  we  some¬ 
times  use  radix  sort  to  sort  records  of  information  that  are  keyed  by  multiple  fields. 
For  example,  we  might  wish  to  sort  dates  by  three  keys:  year,  month,  and  day.  We 
could  run  a  sorting  algorithm  with  a  comparison  function  that,  given  two  dates, 
compares  years,  and  if  there  is  a  tie,  compares  months,  and  if  another  tie  occurs, 
compares  days.  Alternatively,  we  could  sort  the  information  three  times  with  a 
stable  sort:  first  on  day,  next  on  month,  and  finally  on  year. 

The  code  for  radix  sort  is  straightforward.  The  following  procedure  assumes  that 
each  element  in  the  n  -element  array  A  has  cl  digits,  where  digit  1  is  the  lowest-order 
digit  and  digit  d  is  the  highest-order  digit. 

Radix-Sort(A,  d ) 

1  for  /  =  1  to  d 

2  use  a  stable  sort  to  sort  array  A  on  digit  i 


Lemma  8.3 

Given  n  d  -digit  numbers  in  which  each  digit  can  take  on  up  to  k  possible  values, 
Radix-Sort  correctly  sorts  these  numbers  in  ®(d(n  +  k))  time  if  the  stable  sort 
it  uses  takes  0(n  +  k)  time. 

Proof  The  correctness  of  radix  sort  follows  by  induction  on  the  column  being 
sorted  (see  Exercise  8.3-3).  The  analysis  of  the  running  time  depends  on  the  stable 
sort  used  as  the  intermediate  sorting  algorithm.  When  each  digit  is  in  the  range  0 
to  k—  1  (so  that  it  can  take  on  k  possible  values),  and  k  is  not  too  large,  counting  sort 
is  the  obvious  choice.  Each  pass  over  n  d -digit  numbers  then  takes  time  © (/?  +  k). 
There  are  d  passes,  and  so  the  total  time  for  radix  sort  is  Q(d(n  +  k)).  m 

When  d  is  constant  and  k  =  0(n),  we  can  make  radix  sort  run  in  linear  time. 
More  generally,  we  have  some  flexibility  in  how  to  break  each  key  into  digits. 
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Lemma  8.4 

Given  n  b- bit  numbers  and  any  positive  integer  r  <  b,  Radix-Sort  correctly  sorts 
these  numbers  in  &((b/r)(n  +  2''))  time  if  the  stable  sort  it  uses  takes  0(«  +  k) 
time  for  inputs  in  the  range  0  to  k. 

Proof  For  a  value  r  <  b,  we  view  each  key  as  having  d  =  | ~b/r~\  digits  of  r  bits 
each.  Each  digit  is  an  integer  in  the  range  0  to  2r  —  1 ,  so  that  we  can  use  counting 
sort  with  k  =  2r  —  1.  (For  example,  we  can  view  a  32-bit  word  as  having  four  8-bit 
digits,  so  that  b  =  32,  r  =  8,  k  =  2r  —  1  =  255,  and  d  =  b/r  =  4.)  Each  pass  of 
counting  sort  takes  time  &(n  +  k)  =  0(»  +  2r)  and  there  are  d  passes,  for  a  total 
running  time  of  @(d(n  +  2r))  =  ®((b/r)(n  +  2r)).  m 

For  given  values  of  n  and  b,  we  wish  to  choose  the  value  of  r,  with  r  <  b, 
that  minimizes  the  expression  ( b/r)(n  +  2r).  If  b  <  [_lg >t\ .  then  for  any  value 
ofr<b,  we  have  that  (n  +  2r)  =  0(«).  Thus,  choosing  r  =  b  yields  a  running 
time  of  (b/b)(n  +  2b)  =  0(//),  which  is  asymptotically  optimal.  If  b  >  Ug«J, 
then  choosing  r  =  Llg  /?  J  gives  the  best  time  to  within  a  constant  factor,  which 
we  can  see  as  follows.  Choosing  r  =  [lg»J  yields  a  running  time  of  ®(bn /  Ign). 
As  we  increase  r  above  j_lg  /7  J ,  the  2r  term  in  the  numerator  increases  faster  than 
the  r  term  in  the  denominator,  and  so  increasing  r  above  \}gn\  yields  a  running 
time  of  Q.{bn/\gn).  If  instead  we  were  to  decrease  r  below  |_lg  n\ .  then  the  b/r 
term  increases  and  the  n  +  2r  term  remains  at  0(/i). 

Is  radix  sort  preferable  to  a  comparison-based  sorting  algorithm,  such  as  quick¬ 
sort?  If  b  =  0(\g  n),  as  is  often  the  case,  and  we  choose  r  %  Ig  n,  then  radix  sort’s 
running  time  is  0(/t),  which  appears  to  be  better  than  quicksort’s  expected  running 
time  of  ©(/?  lgn).  The  constant  factors  hidden  in  the  0 -notation  differ,  however. 
Although  radix  sort  may  make  fewer  passes  than  quicksort  over  the  n  keys,  each 
pass  of  radix  sort  may  take  significantly  longer.  Which  sorting  algorithm  we  prefer 
depends  on  the  characteristics  of  the  implementations,  of  the  underlying  machine 
(e.g.,  quicksort  often  uses  hardware  caches  more  effectively  than  radix  sort),  and 
of  the  input  data.  Moreover,  the  version  of  radix  sort  that  uses  counting  sort  as  the 
intermediate  stable  sort  does  not  sort  in  place,  which  many  of  the  0(/j  lg/j)-time 
comparison  sorts  do.  Thus,  when  primary  memory  storage  is  at  a  premium,  we 
might  prefer  an  in-place  algorithm  such  as  quicksort. 

Exercises 


8.3-1 

Using  Figure  8.3  as  a  model,  illustrate  the  operation  of  Radix-Sort  on  the  fol¬ 
lowing  list  of  English  words:  COW,  DOG,  SEA,  RUG,  ROW,  MOB,  BOX,  TAB, 
BAR,  EAR,  TAR,  DIG,  BIG,  TEA,  NOW,  FOX. 


200 


Chapter  8  Sorting  in  Linear  Time 


8.3-2 

Which  of  the  following  sorting  algorithms  are  stable:  insertion  sort,  merge  sort, 
heapsort,  and  quicksort?  Give  a  simple  scheme  that  makes  any  sorting  algorithm 
stable.  How  much  additional  time  and  space  does  your  scheme  entail? 


8.3-3 

Use  induction  to  prove  that  radix  sort  works.  Where  does  your  proof  need  the 
assumption  that  the  intermediate  sort  is  stable? 


8.3- 4 

Show  how  to  sort  n  integers  in  the  range  0  to  n3  —  1  in  O(n)  time. 

8.3- 5  * 

In  the  first  card-sorting  algorithm  in  this  section,  exactly  how  many  sorting  passes 
are  needed  to  sort  d -digit  decimal  numbers  in  the  worst  case?  How  many  piles  of 
cards  would  an  operator  need  to  keep  track  of  in  the  worst  case? 


8.4  Bucket  sort 

Bucket  sort  assumes  that  the  input  is  drawn  from  a  uniform  distribution  and  has  an 
average-case  running  time  of  O(n).  Like  counting  sort,  bucket  sort  is  fast  because 
it  assumes  something  about  the  input.  Whereas  counting  sort  assumes  that  the  input 
consists  of  integers  in  a  small  range,  bucket  sort  assumes  that  the  input  is  generated 
by  a  random  process  that  distributes  elements  uniformly  and  independently  over 
the  interval  [0, 1).  (See  Section  C.2  for  a  definition  of  uniform  distribution.) 

Bucket  sort  divides  the  interval  [0,  1)  into  n  equal-sized  subintervals,  or  buckets , 
and  then  distributes  the  n  input  numbers  into  the  buckets.  Since  the  inputs  are  uni¬ 
formly  and  independently  distributed  over  [0, 1),  we  do  not  expect  many  numbers 
to  fall  into  each  bucket.  To  produce  the  output,  we  simply  sort  the  numbers  in  each 
bucket  and  then  go  through  the  buckets  in  order,  listing  the  elements  in  each. 

Our  code  for  bucket  sort  assumes  that  the  input  is  an  n  -element  array  A  and 
that  each  element  A  [i  ]  in  the  array  satisfies  0  <  A  [/']  <  1 .  The  code  requires  an 
auxiliary  array  5[0.  .n  —  1]  of  linked  lists  (buckets)  and  assumes  that  there  is  a 
mechanism  for  maintaining  such  lists.  (Section  10.2  describes  how  to  implement 
basic  operations  on  linked  lists.) 


8.4  Bucket  sort 
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Figure  8.4  The  operation  of  BUCKET  SORT  for  n  =  10.  (a)  The  input  array  A[l . .  10].  (b)  The 
array  B[0 . .  9]  of  sorted  lists  (buckets)  after  line  8  of  the  algorithm.  Bucket  i  holds  values  in  the 
half  open  interval  [«/10,  (i  +  1)/10).  The  sorted  output  consists  of  a  concatenation  in  order  of  the 
lists  B[0].B[1] . fi[9]. 


Bucket-Sort(zI) 

1  let  B  [0 . .  n  —  1]  be  a  new  array 

2  n  =  A.  length 

3  for  i  =  0  to  n  —  1 

4  make  B  [*  ]  an  empty  list 

5  for  i  =  1  to  n 

6  insert  A[i]  into  list  2?[[/iA[i]J] 

7  for  i  =  0  to  n  —  1 

8  sort  list  B  [/ ]  with  insertion  sort 

9  concatenate  the  lists  B[0],  fi[l], . . . ,  B[n  —  1]  together  in  order 

Figure  8.4  shows  the  operation  of  bucket  sort  on  an  input  array  of  10  numbers. 

To  see  that  this  algorithm  works,  consider  two  elements  A[i ]  and  A\j\.  Assume 
without  loss  of  generality  that  A[i]  <  A[j],  Since  [nv4[/]J  <  [/7 ^4 [y  ] J ,  either 
element  A  [/]  goes  into  the  same  bucket  as  A  [j  ]  or  it  goes  into  a  bucket  with  a  lower 
index.  If  A[i]  and  A[j\  go  into  the  same  bucket,  then  the  for  loop  of  lines  7-8  puts 
them  into  the  proper  order.  If  A[i  \  and  A[j\  go  into  different  buckets,  then  line  9 
puts  them  into  the  proper  order.  Therefore,  bucket  sort  works  correctly. 

To  analyze  the  running  time,  observe  that  all  lines  except  line  8  take  O(n)  time 
in  the  worst  case.  We  need  to  analyze  the  total  time  taken  by  the  n  calls  to  insertion 
sort  in  line  8. 
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To  analyze  the  cost  of  the  calls  to  insertion  sort,  let  n,-  be  the  random  variable 
denoting  the  number  of  elements  placed  in  bucket  B[i],  Since  insertion  sort  runs 
in  quadratic  time  (see  Section  2.2),  the  running  time  of  bucket  sort  is 

n— 1 

Tin)  =  0(«)  +  £  OWi)  . 

7=0 

We  now  analyze  the  average-case  running  time  of  bucket  sort,  by  computing  the 
expected  value  of  the  running  time,  where  we  take  the  expectation  over  the  input 
distribution.  Taking  expectations  of  both  sides  and  using  linearity  of  expectation, 


we  have 

E[T(n)]  =  E 

n  —  1 

&{n)  + 

i=0 

n  —  1 

=  0(«)  +  2>[0(*?)] 
i'=0 

(by  linearity  of  expectation) 

n— 1 

=  ®(n)  +  J2°(  E[»?]) 

(by  equation  (C.22))  . 

(8.1) 

7=0 


We  claim  that 

E[nf]  =  2-  l/n  (8.2) 

for  i  =  0, 1 —  1 .  It  is  no  surprise  that  each  bucket  i  has  the  same  value  of 
E  [nf],  since  each  value  in  the  input  array  A  is  equally  likely  to  fall  in  any  bucket. 
To  prove  equation  (8.2),  we  define  indicator  random  variables 

Xjj  =  I  [A[j]  falls  in  bucket  i } 

for  z  =  0, 1, . . . , n  —  1  and  j  =  1,2, ...  ,n.  Thus, 

n 

Hi  —  Xij  . 
j=  1 

To  compute  E  [nf],  we  expand  the  square  and  regroup  terms: 
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E  [nf]  =  E 


=  E 


=  E 


[Exv 

.  \j= i 

n  n 

EE1*1' 

.y’  =  l  k=  1 


/A: 


E*J+  E  E 


7  =  1 


1  <y<n  l<k<n 

k+j 


EEM1  +  E  E  ERi* 


(83) 


7  =  1 


1<7<«  1  <fe<n 

k^j 


where  the  last  line  follows  by  linearity  of  expectation.  We  evaluate  the  two  sum¬ 
mations  separately.  Indicator  random  variable  Xtj  is  1  with  probability  l/n  and  0 
otherwise,  and  therefore 

eK]  =  + 

1 

n 

When  k  ^  j ,  the  variables  Xt]  and  X,/c  are  independent,  and  hence 


E  \XuXik\  =  E[Xij]E[Xik] 

1  1 

n  n 

1 


Substituting  these  two  expected  values  in  equation  (8.3),  we  obtain 


E[»?]  =  E^+  E  E  ^ 

7  =  1  1<7<«  1  <k<n 

k^j 

1  1 

=  n  ■  — b  n(n  —  1)  •  — 


=  1  + 


n  —  1 


=  2-  -  , 
n 


which  proves  equation  (8.2). 
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Using  this  expected  value  in  equation  (8.1),  we  conclude  that  the  average-case 
running  time  for  bucket  sort  is  0(/7)  +  n  ■  0(2  —  1  / n)  =  0(7/). 

Even  if  the  input  is  not  drawn  from  a  uniform  distribution,  bucket  sort  may  still 
run  in  linear  time.  As  long  as  the  input  has  the  property  that  the  sum  of  the  squares 
of  the  bucket  sizes  is  linear  in  the  total  number  of  elements,  equation  (8.1)  tells  us 
that  bucket  sort  will  run  in  linear  time. 

Exercises 


8.4-1 

Using  Figure  8.4  as  a  model,  illustrate  the  operation  of  Bucket-Sort  on  the  array 
A  =  (.79,  .13,  .16,  .64,  .39,  .20,  .89,  .53,  .71,  .42). 


8.4-2 

Explain  why  the  worst-case  running  time  for  bucket  sort  is  0(/72).  What  simple 
change  to  the  algorithm  preserves  its  linear  average-case  running  time  and  makes 
its  worst-case  running  time  0(n  lg  77)? 


8.4- 3 

Let  X  be  a  random  variable  that  is  equal  to  the  number  of  heads  in  two  flips  of  a 
fair  coin.  What  is  E  [A2]?  What  is  E2  [A]? 

8.4- 4  * 

We  are  given  77  points  in  the  unit  circle,  =  (x, ,  >’,),  such  that  0  <  x?  +  y 2  <  1 
for  i  =  1, 2, ...  ,77.  Suppose  that  the  points  are  uniformly  distributed;  that  is,  the 
probability  of  finding  a  point  in  any  region  of  the  circle  is  proportional  to  the  area 
of  that  region.  Design  an  algorithm  with  an  average-case  running  time  of  0(77)  to 
sort  the  77  points  by  their  distances  <7;  =  yj xf  +  yf  from  the  origin.  (Hint:  Design 
the  bucket  sizes  in  Bucket-Sort  to  reflect  the  uniform  distribution  of  the  points 
in  the  unit  circle.) 

8.4- 5  * 

A  probability  distribution  function  P(x)  for  a  random  variable  X  is  defined 
by  P(x)  =  Pr  { A  <  x}.  Suppose  that  we  draw  a  list  of  n  random  variables 
Xx.  X2, . . . ,  Xn  from  a  continuous  probability  distribution  function  P  that  is  com¬ 
putable  in  0(1)  time.  Give  an  algorithm  that  sorts  these  numbers  in  linear  average- 
case  time. 
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8-1  Probabilistic  lower  bounds  on  comparison  sorting 

In  this  problem,  we  prove  a  probabilistic  Ll(n  lg  n)  lower  bound  on  the  running  time 
of  any  deterministic  or  randomized  comparison  sort  on  n  distinct  input  elements. 
We  begin  by  examining  a  deterministic  comparison  sort  A  with  decision  tree  T a . 
We  assume  that  every  permutation  of  /Ls  inputs  is  equally  likely. 

a.  Suppose  that  each  leaf  of  Ta  is  labeled  with  the  probability  that  it  is  reached 
given  a  random  input.  Prove  that  exactly  n !  leaves  are  labeled  1/7/!  and  that  the 
rest  are  labeled  0. 

b.  Let  D(T)  denote  the  external  path  length  of  a  decision  tree  T ;  that  is,  D(T) 
is  the  sum  of  the  depths  of  all  the  leaves  of  T.  Let  T  be  a  decision  tree  with 
k  >  1  leaves,  and  let  LT  and  RT  be  the  left  and  right  subtrees  of  T .  Show  that 
D(T)  =  D(LT)  +  D(RT)  +  k. 

c.  Let  d(k)  be  the  minimum  value  of  D(T)  over  all  decision  trees  T  with  k  >  I 
leaves.  Show  that  d(k )  =  min]<(<j;_|  \d(i )  +  d(k  —  i)  +  k}.  (Hint:  Consider 
a  decision  tree  T  with  k  leaves  that  achieves  the  minimum.  Let  i0  be  the  number 
of  leaves  in  LT  and  k  —  i0  the  number  of  leaves  in  RT.) 

d.  Prove  that  for  a  given  value  of  k  >  1  and  i  in  the  range  1  <  i  <  k  —  1 ,  the 
function  i  lg  i  +  (k  —  i )  \g(k  —  i)  is  minimized  at  i  =  k/ 2.  Conclude  that 
d(k)  =  Q(k  IgA;). 

e.  Prove  that  D(TA)  =  Ll(n\  lg («!)),  and  conclude  that  the  average-case  time  to 
sort  n  elements  is  Q(n  lg  n). 

Now,  consider  a  randomized  comparison  sort  B.  We  can  extend  the  decision- 
tree  model  to  handle  randomization  by  incorporating  two  kinds  of  nodes:  ordinary 
comparison  nodes  and  “randomization”  nodes.  A  randomization  node  models  a 
random  choice  of  the  form  Random(1,  r)  made  by  algorithm  B\  the  node  has  r 
children,  each  of  which  is  equally  likely  to  be  chosen  during  an  execution  of  the 
algorithm. 

/.  Show  that  for  any  randomized  comparison  sort  B,  there  exists  a  deterministic 
comparison  sort  A  whose  expected  number  of  comparisons  is  no  more  than 
those  made  by  B. 
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8-2  Sorting  in  place  in  linear  time 

Suppose  that  we  have  an  array  of  n  data  records  to  sort  and  that  the  key  of  each 
record  has  the  value  0  or  1 .  An  algorithm  for  sorting  such  a  set  of  records  might 
possess  some  subset  of  the  following  three  desirable  characteristics: 

1.  The  algorithm  runs  in  0(n)  time. 

2.  The  algorithm  is  stable. 

3.  The  algorithm  sorts  in  place,  using  no  more  than  a  constant  amount  of  storage 
space  in  addition  to  the  original  array. 

a.  Give  an  algorithm  that  satisfies  criteria  1  and  2  above. 

b.  Give  an  algorithm  that  satisfies  criteria  1  and  3  above. 

c.  Give  an  algorithm  that  satisfies  criteria  2  and  3  above. 

d.  Can  you  use  any  of  your  sorting  algorithms  from  parts  (a)-(c)  as  the  sorting 

method  used  in  line  2  of  Radix-Sort,  so  that  Radix-Sort  sorts  n  records 
with  b- bit  keys  in  0(bn )  time?  Explain  how  or  why  not. 

e.  Suppose  that  the  n  records  have  keys  in  the  range  from  1  to  k.  Show  how  to 
modify  counting  sort  so  that  it  sorts  the  records  in  place  in  0(n  +  k  )  time.  You 
may  use  0(k)  storage  outside  the  input  array.  Is  your  algorithm  stable?  {Hint: 
How  would  you  do  it  for  k  =  3?) 

8-3  Sorting  variable-length  items 

a.  You  are  given  an  array  of  integers,  where  different  integers  may  have  different 
numbers  of  digits,  but  the  total  number  of  digits  over  all  the  integers  in  the  array 
is  n.  Show  how  to  sort  the  array  in  0{n)  time. 

b.  You  are  given  an  array  of  strings,  where  different  strings  may  have  different 
numbers  of  characters,  but  the  total  number  of  characters  over  all  the  strings 
is  77.  Show  how  to  sort  the  strings  in  0{n)  time. 

(Note  that  the  desired  order  here  is  the  standard  alphabetical  order;  for  example, 
a  <  ab  <  b.) 

8-4  Water  jugs 

Suppose  that  you  are  given  n  red  and  n  blue  water  jugs,  all  of  different  shapes  and 
sizes.  All  red  jugs  hold  different  amounts  of  water,  as  do  the  blue  ones.  Moreover, 
for  every  red  jug,  there  is  a  blue  jug  that  holds  the  same  amount  of  water,  and  vice 


versa. 
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Your  task  is  to  find  a  grouping  of  the  jugs  into  pairs  of  red  and  blue  jugs  that  hold 
the  same  amount  of  water.  To  do  so,  you  may  perform  the  following  operation:  pick 
a  pair  of  jugs  in  which  one  is  red  and  one  is  blue,  fill  the  red  jug  with  water,  and 
then  pour  the  water  into  the  blue  jug.  This  operation  will  tell  you  whether  the  red 
or  the  blue  jug  can  hold  more  water,  or  that  they  have  the  same  volume.  Assume 
that  such  a  comparison  takes  one  time  unit.  Your  goal  is  to  find  an  algorithm  that 
makes  a  minimum  number  of  comparisons  to  determine  the  grouping.  Remember 
that  you  may  not  directly  compare  two  red  jugs  or  two  blue  jugs. 

a.  Describe  a  deterministic  algorithm  that  uses  0(/?2)  comparisons  to  group  the 
jugs  into  pairs. 

b.  Prove  a  lower  bound  of  £l{n  lg  n)  for  the  number  of  comparisons  that  an  algo¬ 
rithm  solving  this  problem  must  make. 

c.  Give  a  randomized  algorithm  whose  expected  number  of  comparisons  is 
0{n  lg  n),  and  prove  that  this  bound  is  correct.  What  is  the  worst-case  num¬ 
ber  of  comparisons  for  your  algorithm? 

8-5  Average  sorting 

Suppose  that,  instead  of  sorting  an  array,  we  just  require  that  the  elements  increase 
on  average.  More  precisely,  we  call  an  n -element  array  A  k -sorted  if,  for  all 
i  =  1,2 , ,n  —  k,  the  following  holds: 

EJ2"1  A[j]  ^EjS+i^L /] 

k  ~  k 

a.  What  does  it  mean  for  an  array  to  be  1  -sorted? 

b.  Give  a  permutation  of  the  numbers  1, 2, . . . ,  10  that  is  2-sorted,  but  not  sorted. 

c.  Prove  that  an  //-element  array  is  /< -sorted  if  and  only  if  A  [7 ]  <  A[i  +  k]  for  all 
i  =  1,2, ... ,n  —  k. 

d.  Give  an  algorithm  that  k -sorts  an  n -element  array  in  O (n  lg {n/  k))  time. 

We  can  also  show  a  lower  bound  on  the  time  to  produce  a  k -sorted  array,  when  k 
is  a  constant. 

e.  Show  that  we  can  sort  a  /c -sorted  array  of  length  n  in  0(n  lg  k)  time.  {Hint: 
Use  the  solution  to  Exercise  6.5-9.  ) 

/.  Show  that  when  k  is  a  constant,  /r -sorting  an  //-element  array  requires  Q(n  lg  n) 
time.  {Hint:  Use  the  solution  to  the  previous  pail  along  with  the  lower  bound 
on  comparison  sorts.) 
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8-6  Lower  bound  on  merging  sorted  lists 

The  problem  of  merging  two  sorted  lists  arises  frequently.  We  have  seen  a  pro¬ 
cedure  for  it  as  the  subroutine  Merge  in  Section  2.3.1.  In  this  problem,  we  will 
prove  a  lower  bound  of  2 n  —  1  on  the  worst-case  number  of  comparisons  required 
to  merge  two  sorted  lists,  each  containing  n  items. 

First  we  will  show  a  lower  bound  of  2n  —  o(n  )  comparisons  by  using  a  decision 
tree. 

a.  Given  2 n  numbers,  compute  the  number  of  possible  ways  to  divide  them  into 
two  sorted  lists,  each  with  n  numbers. 

b.  Using  a  decision  tree  and  your  answer  to  part  (a),  show  that  any  algorithm  that 
correctly  merges  two  sorted  lists  must  perform  at  least  2 n  —  o(n)  comparisons. 

Now  we  will  show  a  slightly  tighter  2 n  —  1  bound. 

c.  Show  that  if  two  elements  are  consecutive  in  the  sorted  order  and  from  different 
lists,  then  they  must  be  compared. 

d.  Use  your  answer  to  the  previous  part  to  show  a  lower  bound  of  2 n  —  1  compar¬ 
isons  for  merging  two  sorted  lists. 

8-7  The  0-1  sorting  lemma  and  columnsort 

A  compare-exchange  operation  on  two  array  elements  A  [ i  ]  and  A  [  j } ,  where  i  <  j , 
has  the  form 

Compare-Exchange  (A,  i,  j) 

1  if  A[i]>  A[j] 

2  exchange  A  [i  ]  with  A  [j  ] 

After  the  compare-exchange  operation,  we  know  that  A[i\  <  A[j], 

An  oblivious  compare-exchange  algorithm  operates  solely  by  a  sequence  of 
prespecified  compare-exchange  operations.  The  indices  of  the  positions  compared 
in  the  sequence  must  be  determined  in  advance,  and  although  they  can  depend 
on  the  number  of  elements  being  sorted,  they  cannot  depend  on  the  values  being 
sorted,  nor  can  they  depend  on  the  result  of  any  prior  compare-exchange  operation. 
For  example,  here  is  insertion  sort  expressed  as  an  oblivious  compare-exchange 
algorithm: 

Insertion-Sort  (A) 

1  for  j  =  2  to  A .  length 

2  for  i  =  j  —  1  downto  1 

3  Compare-Exchange(A,  i,  i  +  1) 
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The  0-1  sorting  lemma  provides  a  powerful  way  to  prove  that  an  oblivious 
compare-exchange  algorithm  produces  a  sorted  result.  It  states  that  if  an  oblivi¬ 
ous  compare-exchange  algorithm  correctly  sorts  all  input  sequences  consisting  of 
only  Os  and  Is,  then  it  correctly  sorts  all  inputs  containing  arbitrary  values. 

You  will  prove  the  0-1  sorting  lemma  by  proving  its  contrapositive:  if  an  oblivi¬ 
ous  compare-exchange  algorithm  fails  to  sort  an  input  containing  arbitrary  values, 
then  it  fails  to  sort  some  0-1  input.  Assume  that  an  oblivious  compare-exchange  al¬ 
gorithm  X  fails  to  correctly  sort  the  array  A[l . .  n\.  Let  A[p]  be  the  smallest  value 
in  A  that  algorithm  X  puts  into  the  wrong  location,  and  let  A[q ]  be  the  value  that 
algorithm  X  moves  to  the  location  into  which  A[p ]  should  have  gone.  Define  an 
array  B  [1  . .  n\  of  Os  and  Is  as  follows: 

BU]  _  j  0  if  A[i]<A[p], 

|  1  if  A[i\  >  A[p]  . 

a.  Argue  that  A[q\  >  A[p],  so  that  B[p]  =  0  and  B[q]  =  1. 

b.  To  complete  the  proof  of  the  0- 1  sorting  lemma,  prove  that  algorithm  X  fails  to 
sort  array  B  correctly. 

Now  you  will  use  the  0-1  sorting  lemma  to  prove  that  a  particular-  sorting  algo¬ 
rithm  works  correctly.  The  algorithm,  columnsort,  works  on  a  rectangular  array 
of  n  elements.  The  array  has  r  rows  and  s  columns  (so  that  n  =  rs),  subject  to 
three  restrictions: 

•  r  must  be  even, 

•  s  must  be  a  divisor  of  r,  and 

•  r  >  2s2. 

When  columnsort  completes,  the  array  is  sorted  in  column-major  order,  reading 
down  the  columns,  from  left  to  right,  the  elements  monotonically  increase. 

Columnsort  operates  in  eight  steps,  regardless  of  the  value  of  n.  The  odd  steps 
are  all  the  same:  sort  each  column  individually.  Each  even  step  is  a  fixed  permuta¬ 
tion.  Here  are  the  steps: 

1.  Sort  each  column. 

2.  Transpose  the  array,  but  reshape  it  back  to  r  rows  and  s  columns.  In  other 
words,  turn  the  leftmost  column  into  the  top  r/s  rows,  in  order;  turn  the  next 
column  into  the  next  r/s  rows,  in  order;  and  so  on. 

3.  Sort  each  column. 

4.  Perform  the  inverse  of  the  permutation  performed  in  step  2. 
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Figure  8.5  The  steps  of  columnsort.  (a)  The  input  array  with  6  rows  and  3  columns,  (b)  After 
sorting  each  column  in  step  1.  (c)  After  transposing  and  reshaping  in  step  2.  (d)  After  sorting  each 
column  in  step  3.  (e)  After  performing  step  4,  which  inverts  the  permutation  from  step  2.  (f)  After 
sorting  each  column  in  step  5.  (g)  After  shifting  by  half  a  column  in  step  6.  (h)  After  sorting  each 
column  in  step  7.  (i)  After  performing  step  8,  which  inverts  the  permutation  from  step  6.  The  array 
is  now  sorted  in  column  major  order. 

5.  Sort  each  column. 

6.  Shift  the  top  half  of  each  column  into  the  bottom  half  of  the  same  column,  and 
shift  the  bottom  half  of  each  column  into  the  top  half  of  the  next  column  to  the 
right.  Leave  the  top  half  of  the  leftmost  column  empty.  Shift  the  bottom  half 
of  the  last  column  into  the  top  half  of  a  new  rightmost  column,  and  leave  the 
bottom  half  of  this  new  column  empty. 

7.  Sort  each  column. 

8.  Perform  the  inverse  of  the  permutation  performed  in  step  6. 

Figure  8.5  shows  an  example  of  the  steps  of  columnsort  with  r  =  6  and  s  —  3. 
(Even  though  this  example  violates  the  requirement  that  r  >  2s2,  it  happens  to 
work.) 

c.  Argue  that  we  can  treat  columnsort  as  an  oblivious  compare-exchange  algo¬ 
rithm,  even  if  we  do  not  know  what  sorting  method  the  odd  steps  use. 

Although  it  might  seem  hard  to  believe  that  columnsort  actually  sorts,  you  will 
use  the  0-1  sorting  lemma  to  prove  that  it  does.  The  0-1  sorting  lemma  applies 
because  we  can  treat  columnsort  as  an  oblivious  compare-exchange  algorithm.  A 
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couple  of  definitions  will  help  you  apply  the  0-1  sorting  lemma.  We  say  that  an  area 
of  an  array  is  clean  if  we  know  that  it  contains  either  all  Os  or  all  Is.  Otherwise, 
the  area  might  contain  mixed  Os  and  Is,  and  it  is  dirty.  From  here  on,  assume  that 
the  input  array  contains  only  Os  and  Is,  and  that  we  can  treat  it  as  an  array  with  r 
rows  and  s  columns. 

d.  Prove  that  after  steps  1-3,  the  array  consists  of  some  clean  rows  of  Os  at  the  top, 
some  clean  rows  of  1  s  at  the  bottom,  and  at  most  s  dirty  rows  between  them. 

e.  Prove  that  after  step  4,  the  array,  read  in  column-major  order,  starts  with  a  clean 
area  of  Os,  ends  with  a  clean  area  of  Is,  and  has  a  duly  area  of  at  most  s2 
elements  in  the  middle. 

/.  Prove  that  steps  5-8  produce  a  fully  sorted  0- 1  output.  Conclude  that  column- 
sort  correctly  sorts  all  inputs  containing  arbitrary  values. 

g.  Now  suppose  that  s  does  not  divide  r.  Prove  that  after  steps  1-3,  the  array 
consists  of  some  clean  rows  of  Os  at  the  top,  some  clean  rows  of  Is  at  the 
bottom,  and  at  most  2s  —  1  dirty  rows  between  them.  How  large  must  r  be, 
compared  with  s,  for  columnsort  to  correctly  sort  when  s  does  not  divide  r? 

h.  Suggest  a  simple  change  to  step  1  that  allows  us  to  maintain  the  requirement 
that  r  >  2 s2  even  when  s  does  not  divide  r,  and  prove  that  with  your  change, 
columnsort  correctly  sorts. 


Chapter  notes 

The  decision-tree  model  for  studying  comparison  sorts  was  introduced  by  Ford 
and  Johnson  [110].  Knuth’s  comprehensive  treatise  on  sorting  [211]  covers  many 
variations  on  the  sorting  problem,  including  the  information-theoretic  lower  bound 
on  the  complexity  of  sorting  given  here.  Ben-Or  [39]  studied  lower  bounds  for 
sorting  using  generalizations  of  the  decision-tree  model. 

Knuth  credits  H.  H.  Seward  with  inventing  counting  sort  in  1954,  as  well  as  with 
the  idea  of  combining  counting  sort  with  radix  sort.  Radix  sorting  stalling  with  the 
least  significant  digit  appears  to  be  a  folk  algorithm  widely  used  by  operators  of 
mechanical  card-sorting  machines.  According  to  Knuth,  the  first  published  refer¬ 
ence  to  the  method  is  a  1929  document  by  L.  J.  Comrie  describing  punched-card 
equipment.  Bucket  sorting  has  been  in  use  since  1956,  when  the  basic  idea  was 
proposed  by  E.  J.  Isaac  and  R.  C.  Singleton  [188]. 

Munro  and  Raman  [263]  give  a  stable  sorting  algorithm  that  performs  0{n  l+€) 
comparisons  in  the  worst  case,  where  0  <  e  <  1  is  any  fixed  constant.  Although 
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any  of  the  0(n  lg  «)-time  algorithms  make  fewer  comparisons,  the  algorithm  by 
Munro  and  Raman  moves  data  only  O(n)  times  and  operates  in  place. 

The  case  of  sorting  n  b- bit  integers  in  o(nlgn)  time  has  been  considered  by 
many  researchers.  Several  positive  results  have  been  obtained,  each  under  slightly 
different  assumptions  about  the  model  of  computation  and  the  restrictions  placed 
on  the  algorithm.  All  the  results  assume  that  the  computer  memory  is  divided  into 
addressable  6-bit  words.  Fredman  and  Willard  [115]  introduced  the  fusion  tree  data 
structure  and  used  it  to  sort  n  integers  in  0(n  lg  nj  lg  lg  //)  time.  This  bound  was 
later  improved  to  0(n  ydg  n)  time  by  Andersson  [16].  These  algorithms  require 
the  use  of  multiplication  and  several  precomputed  constants.  Andersson,  Hagerup, 
Nilsson,  and  Raman  [17]  have  shown  how  to  sort  n  integers  in  0(n  lg  lg  n)  time 
without  using  multiplication,  but  their  method  requires  storage  that  can  be  un¬ 
bounded  in  terms  of  n.  Using  multiplicative  hashing,  we  can  reduce  the  storage 
needed  to  0(n),  but  then  the  0(n  lg  lg  n)  worst-case  bound  on  the  running  time 
becomes  an  expected-time  bound.  Generalizing  the  exponential  search  trees  of 
Andersson  [16],  Thorup  [335]  gave  an  0(n(\g  lg/?)2)-time  sorting  algorithm  that 
does  not  use  multiplication  or  randomization,  and  it  uses  1  i near  space.  Combining 
these  techniques  with  some  new  ideas,  Han  [158]  improved  the  bound  for  sorting 
to  0(n  lg  lg  n  lg  lg  lg  7? )  time.  Although  these  algorithms  are  important  theoretical 
breakthroughs,  they  are  all  fairly  complicated  and  at  the  present  time  seem  unlikely 
to  compete  with  existing  sorting  algorithms  in  practice. 

The  columnsort  algorithm  in  Problem  8-7  is  by  Leighton  [227]. 
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The  zth  order  statistic  of  a  set  of  n  elements  is  the  zth  smallest  element.  For 
example,  the  minimum  of  a  set  of  elements  is  the  first  order  statistic  (z  =  1), 
and  the  maximum  is  the  /zth  order  statistic  (z  =  n).  A  median ,  informally,  is 
the  “halfway  point”  of  the  set.  When  n  is  odd,  the  median  is  unique,  occurring  at 
i  =  (n  +  l)/2.  When  n  is  even,  there  are  two  medians,  occurring  at  z  =  zz/2  and 
z  =  zz/2+1.  Thus,  regardless  of  the  parity  of  n,  medians  occur  at  z  =  |_( n  +  1)/2J 
(the  lower  median)  and  z  =  ["(/?  +  l)/2]  (the  upper  median).  For  simplicity  in 
this  text,  however,  we  consistently  use  the  phrase  “the  median”  to  refer  to  the  lower 
median. 

This  chapter  addresses  the  problem  of  selecting  the  zth  order  statistic  from  a 
set  of  n  distinct  numbers.  We  assume  for  convenience  that  the  set  contains  dis¬ 
tinct  numbers,  although  virtually  everything  that  we  do  extends  to  the  situation  in 
which  a  set  contains  repeated  values.  We  formally  specify  the  selection  problem 
as  follows: 

Input:  A  set  A  of  n  (distinct)  numbers  and  an  integer  z,  with  1  <  i  <  n. 

Output:  The  element  x  e  A  that  is  larger  than  exactly  z  —  1  other  elements  of  A. 

We  can  solve  the  selection  problem  in  0(n  lgzz)  time,  since  we  can  sort  the  num¬ 
bers  using  heapsort  or  merge  sort  and  then  simply  index  the  zth  element  in  the 
output  array.  This  chapter  presents  faster  algorithms. 

In  Section  9.1,  we  examine  the  problem  of  selecting  the  minimum  and  maxi¬ 
mum  of  a  set  of  elements.  More  interesting  is  the  general  selection  problem,  which 
we  investigate  in  the  subsequent  two  sections.  Section  9.2  analyzes  a  practical 
randomized  algorithm  that  achieves  an  O(n  )  expected  running  time,  assuming  dis¬ 
tinct  elements.  Section  9.3  contains  an  algorithm  of  more  theoretical  interest  that 
achieves  the  O(n)  running  time  in  the  worst  case. 
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9.1  Minimum  and  maximum 

How  many  comparisons  are  necessary  to  determine  the  minimum  of  a  set  of  n 
elements?  We  can  easily  obtain  an  upper  bound  of  n  —  1  comparisons:  examine 
each  element  of  the  set  in  turn  and  keep  track  of  the  smallest  element  seen  so 
far.  In  the  following  procedure,  we  assume  that  the  set  resides  in  array  A,  where 
A.  length  =  n. 

Minimum  {A) 

1  min  =  /l  [  1  ] 

2  for  i  =  2  to  A. length 

3  if  min  >  A[i ] 

4  min  =  A[i\ 

5  return  min 

We  can,  of  course,  find  the  maximum  with  n  —  1  comparisons  as  well. 

Is  this  the  best  we  can  do?  Yes,  since  we  can  obtain  a  lower  bound  of  «  —  1 
comparisons  for  the  problem  of  determining  the  minimum.  Think  of  any  algorithm 
that  determines  the  minimum  as  a  tournament  among  the  elements.  Each  compar¬ 
ison  is  a  match  in  the  tournament  in  which  the  smaller  of  the  two  elements  wins. 
Observing  that  every  element  except  the  winner  must  lose  at  least  one  match,  we 
conclude  that  n  —  1  comparisons  are  necessary  to  determine  the  minimum.  Hence, 
the  algorithm  Minimum  is  optimal  with  respect  to  the  number  of  comparisons 
performed. 

Simultaneous  minimum  and  maximum 

In  some  applications,  we  must  find  both  the  minimum  and  the  maximum  of  a  set 
of  n  elements.  For  example,  a  graphics  program  may  need  to  scale  a  set  of  (x ,  y ) 
data  to  fit  onto  a  rectangular  display  screen  or  other  graphical  output  device.  To 
do  so,  the  program  must  first  determine  the  minimum  and  maximum  value  of  each 
coordinate. 

At  this  point,  it  should  be  obvious  how  to  determine  both  the  minimum  and  the 
maximum  of  n  elements  using  Q(/?)  comparisons,  which  is  asymptotically  optimal: 
simply  find  the  minimum  and  maximum  independently,  using  n  —  1  comparisons 
for  each,  for  a  total  of  2 n  —  2  comparisons. 

In  fact,  we  can  find  both  the  minimum  and  the  maximum  using  at  most  3  |  n  / 2J 
comparisons.  We  do  so  by  maintaining  both  the  minimum  and  maximum  elements 
seen  thus  far.  Rather  than  processing  each  element  of  the  input  by  comparing  it 
against  the  current  minimum  and  maximum,  at  a  cost  of  2  comparisons  per  element, 
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we  process  elements  in  pairs.  We  compare  pair's  of  elements  from  the  input  first 
with  each  other ,  and  then  we  compare  the  smaller  with  the  current  minimum  and 
the  larger  to  the  current  maximum,  at  a  cost  of  3  comparisons  for  every  2  elements. 

How  we  set  up  initial  values  for  the  current  minimum  and  maximum  depends 
on  whether  n  is  odd  or  even.  If  n  is  odd,  we  set  both  the  minimum  and  maximum 
to  the  value  of  the  first  element,  and  then  we  process  the  rest  of  the  elements  in 
pairs.  If  n  is  even,  we  perform  1  comparison  on  the  first  2  elements  to  determine 
the  initial  values  of  the  minimum  and  maximum,  and  then  process  the  rest  of  the 
elements  in  pairs  as  in  the  case  for  odd  n . 

Let  us  analyze  the  total  number  of  comparisons.  If  n  is  odd,  then  we  perform 
3  [ n  / 2J  comparisons.  If  n  is  even,  we  perform  1  initial  comparison  followed  by 
3 (n  —  2)/2  comparisons,  for  a  total  of  3n/2  —  2.  Thus,  in  either  case,  the  total 
number  of  comparisons  is  at  most  3  [n  /  2J . 

Exercises 


9.1- 1 

Show  that  the  second  smallest  of  n  elements  can  be  found  with  n  +  [Ig  ri\  —  2 
comparisons  in  the  worst  case.  {Hint:  Also  find  the  smallest  element.) 

9.1- 2  * 

Prove  the  lower  bound  of  |"3«/2]  —  2  comparisons  in  the  worst  case  to  find  both 
the  maximum  and  minimum  of  n  numbers.  {Hint:  Consider  how  many  numbers 
are  potentially  either  the  maximum  or  minimum,  and  investigate  how  a  comparison 
affects  these  counts.) 


9.2  Selection  in  expected  linear  time 

The  general  selection  problem  appears  more  difficult  than  the  simple  problem  of 
finding  a  minimum.  Yet,  surprisingly,  the  asymptotic  running  time  for  both  prob¬ 
lems  is  the  same:  0(n).  In  this  section,  we  present  a  divide-and-conquer  algorithm 
for  the  selection  problem.  The  algorithm  Randomized-Select  is  modeled  after 
the  quicksort  algorithm  of  Chapter  7.  As  in  quicksort,  we  partition  the  input  array 
recursively.  But  unlike  quicksort,  which  recursively  processes  both  sides  of  the 
partition,  Randomized-Select  works  on  only  one  side  of  the  partition.  This 
difference  shows  up  in  the  analysis:  whereas  quicksort  has  an  expected  running 
time  of  0(«  lg  «),  the  expected  running  time  of  Randomized-Select  is  0(/r), 
assuming  that  the  elements  are  distinct. 
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Randomized-Select  uses  the  procedure  Randomized-Partition  intro¬ 
duced  in  Section  7.3.  Thus,  like  Randomized-Quicksort,  it  is  a  randomized  al¬ 
gorithm,  since  its  behavior  is  determined  in  part  by  the  output  of  a  random-number 
generator.  The  following  code  for  Randomized-Select  returns  the  z'th  smallest 
element  of  the  array  A[p  . .  r\. 

Randomized-Select(^,  p ,  r,  z) 

1  if  p  ==  r 

2  return  A[p\ 

3  q  =  Randomized-Partition(^,  p,  r) 

4  k  =  q  —  p  +  1 

5  if  i  ==  k  //  the  pivot  value  is  the  answer 

6  return  A[q\ 

7  elseif  i  <  k 

8  return  Randomized-Select(,4,  p,  q  -  1  ,  z) 

9  else  return  Randomized-Select(,4,  q  +  1,  r,  i  -  k) 

The  Randomized-Select  procedure  works  as  follows.  Line  1  checks  for  the 
base  case  of  the  recursion,  in  which  the  subarray  A[p . .  r]  consists  of  just  one 
element.  In  this  case,  z  must  equal  1,  and  we  simply  return  A  [p]  in  line  2  as  the 
z'th  smallest  element.  Otherwise,  the  call  to  Randomized-Partition  in  line  3 
partitions  the  array  A[p..r\  into  two  (possibly  empty)  subarrays  A  [p . .  q  —  1] 
and  A[q  +  1  . .  z-]  such  that  each  element  of  A[p  . .  q  —  1]  is  less  than  or  equal 
to  A[q\ ,  which  in  turn  is  less  than  each  element  of  A[q  +  1  . .  r\.  As  in  quicksort, 
we  will  refer  to  A  [q]  as  the  pivot  element.  Line  4  computes  the  number  k  of 
elements  in  the  subarray  A[p  . .  q\,  that  is,  the  number  of  elements  in  the  low  side 
of  the  partition,  plus  one  for  the  pivot  element.  Line  5  then  checks  whether  A[q\  is 
the  z'th  smallest  element.  If  it  is,  then  line  6  returns  A[q\.  Otherwise,  the  algorithm 
determines  in  which  of  the  two  subarrays  A[p  . .  q  —  1]  and  A[q  +  1 . .  r]  the  z'th 
smallest  element  lies.  If  z  <  k,  then  the  desired  element  lies  on  the  low  side  of 
the  partition,  and  line  8  recursively  selects  it  from  the  subarray.  If  z  >  k,  however, 
then  the  desired  element  lies  on  the  high  side  of  the  partition.  Since  we  already 
know  k  values  that  are  smaller  than  the  z  th  smallest  element  of  A  [p  . .  r]  —  namely, 
the  elements  of  A[p  .  .q\— the  desired  element  is  the  (z  —  k)th  smallest  element 
of  A[q  T  1  ■ .  r],  which  line  9  finds  recursively.  The  code  appears  to  allow  recursive 
calls  to  subarrays  with  0  elements,  but  Exercise  9.2-1  asks  you  to  show  that  this 
situation  cannot  happen. 

The  worst-case  running  time  for  Randomized-Select  is  0(zz2),  even  to  find 
the  minimum,  because  we  could  be  extremely  unlucky  and  always  partition  around 
the  largest  remaining  element,  and  partitioning  takes  @(zz)  time.  We  will  see  that 
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the  algorithm  has  a  linear  expected  running  time,  though,  and  because  it  is  random¬ 
ized,  no  particular  input  elicits  the  worst-case  behavior. 

To  analyze  the  expected  running  time  of  Randomized-Select,  we  let  the  run¬ 
ning  time  on  an  input  array  A[p  . .  r\  of  n  elements  be  a  random  variable  that  we 
denote  by  T(n),  and  we  obtain  an  upper  bound  on  E  [T (//)]  as  follows.  The  pro¬ 
cedure  Randomized-Partition  is  equally  likely  to  return  any  element  as  the 
pivot.  Therefore,  for  each  k  such  that  1  <  k  <  n,  the  subarray  A[p  . .  q]  has  k  ele¬ 
ments  (all  less  than  or  equal  to  the  pivot)  with  probability  \/n.  For  k  =  1.2,...,//, 
we  define  indicator  random  variables  Xk  where 

Xk  =  I  {the  subarray  A[p  . .  q]  has  exactly  k  elements}  , 

and  so,  assuming  that  the  elements  are  distinct,  we  have 

E[Xk\  =  l/n.  (9.1) 

When  we  call  Randomized-Select  and  choose  A  [q]  as  the  pivot  element,  we 
do  not  know,  a  priori,  if  we  will  terminate  immediately  with  the  correct  answer, 
recurse  on  the  subarray  A[p . .  q  —  1],  or  recurse  on  the  subarray  A[q  +  1 . .  r]. 
This  decision  depends  on  where  the  /th  smallest  element  falls  relative  to  A  [}/]. 
Assuming  that  T(n)  is  monotonically  increasing,  we  can  upper-bound  the  time 
needed  for  the  recursive  call  by  the  time  needed  for  the  recursive  call  on  the  largest 
possible  input.  In  other  words,  to  obtain  an  upper  bound,  we  assume  that  the  /  th 
element  is  always  on  the  side  of  the  partition  with  the  greater  number  of  elements. 
For  a  given  call  of  Randomized-Select,  the  indicator  random  variable  Xk  has 
the  value  1  for  exactly  one  value  of  k,  and  it  is  0  for  all  other  k.  When  Xk  =  1,  the 
two  subarrays  on  which  we  might  recurse  have  sizes  k  —  1  and  n  —  k.  Hence,  we 
have  the  recurrence 

n 

T(n)  <  ■  ( T(max(k  —  1  ,n  —  k ))  +  Q(n )) 

k=  1 
n 

=  Xk  •  T(max(k  —  1,  n  —  k))  +  O(n)  . 
k=  1 
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Taking  expected  values,  we  have 


E[7»] 
<  E 


y;  Xk  ■  T(max(k  —  1,/;  —  k))  +  0(ri) 

k-=\ 

y  E  [Xk  ■  T (ma x(k  —  1  ,n  —  k))]  +  0(n) 

k=  1 
n 

y  E  [X/t]  •  E  [r(max(/(:  —  1, «  —  k))]  +  0(«) 

fc=l 

*  1 

y  -  •  E  [r(max(k  —  1  ,n  —  k))]  +  0(n) 


k=  1 


(by  linearity  of  expectation) 
(by  equation  (C.24)) 

(by  equation  (9.1))  . 


In  order  to  apply  equation  (C.24),  we  rely  on  Xk  and  74  max  (A:  —  1  ,n  —  k))  being 
independent  random  variables.  Exercise  9.2-2  asks  you  to  justify  this  assertion. 
Let  us  consider  the  expression  max(/c  —  1,  n  —  k).  We  have 


max(k  —  1,  n  —  k) 


k  —  1  if  k  >  \n/2]  , 
n  —  k  if  k  <  |"«/2]  . 


If  n  is  even,  each  term  from  T(\n/2])  up  to  T(n  —  1)  appears  exactly  twice  in 
the  summation,  and  if  n  is  odd,  all  these  terms  appear  twice  and  T([n/ 2J )  appears 
once.  Thus,  we  have 


2 

E[7»]<-  y  E  [T{k)]  +  0(n). 

k=  \n/2\ 

We  show  that  E[T(/?)]  =  0(n)  by  substitution.  Assume  that  E  [T{ri)\  <  cn  for 
some  constant  c  that  satisfies  the  initial  conditions  of  the  recurrence.  We  assume 
that  T{n)  =  0(1)  for  n  less  than  some  constant;  we  shall  pick  this  constant  later. 
We  also  pick  a  constant  a  such  that  the  function  described  by  the  O(n)  term  above 
(which  describes  the  non-recursive  component  of  the  running  time  of  the  algo¬ 
rithm)  is  bounded  from  above  by  an  for  all  n  >  0.  Using  this  inductive  hypothesis, 
we  have 


72  —  1 


E[7»]  <  -  y  ck  + 


an 


2c 


k  —  \n/2\ 
n—  1 


|n/2J  — 1 


t  £*-  £  *  + 


an 


\k=  1 


k=  1 
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~  77  V  2  2 


2 c  ( (n  —  1)/?  (L??/: 

~n  V  2 

2c  ( (/?  —  1)7?  (77/2 


2c  / (??  —  I)??  (L??/2J  —  1)  \ji/2\ 

~n  V  2  2 

2c  /  (??  —  1)??  (??  / 2  —  2) (77/2  —  1 


77  V  2 


+  an 


+  an 


+  an 


3cn  c 

<  — - h  -  +  <372 

4  2 

/  C77  C 

=  C7?  —  ( - a  n 

V  4  2 


In  order  to  complete  the  proof,  we  need  to  show  that  for  sufficiently  large  n ,  this 
last  expression  is  at  most  cn  or,  equivalently,  that  cn /4  —  c/2  —  an  >0.  If  we 
add  c/2  to  both  sides  and  factor  out  77,  we  get  77  (c/4  —  a)  >  c/2.  As  long  as  we 
choose  the  constant  c  so  that  c/4  —  a  >  0,  i.e.,  c  >  4a,  we  can  divide  both  sides 
by  c/4  —  a,  giving 


c/2  2c 


77  > 


c/4  —  a  c  —  Aa 

Thus,  if  we  assume  that  T(n)  =  0(l)for77  <  2c/(c  —  Aa),  then  E  [7"  (77 )]  =  0(7?). 
We  conclude  that  we  can  find  any  order  statistic,  and  in  particular  the  median,  in 
expected  linear  time,  assuming  that  the  elements  are  distinct. 


Exercises 


9.2-1 


Show  that  Randomized-Select  never  makes  a  recursive  call  to  a  O-length  array. 

9.2-2 

Argue  that  the  indicator  random  variable  Xk  and  the  value  7’(max(/c  —  1, 7?  —  k)) 
are  independent. 


9.2-3 

Write  an  iterative  version  of  Randomized-Select. 
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9.2-4 

Suppose  we  use  Randomized-Select  to  select  the  minimum  element  of  the 
array  A  =  (3,  2,  9,  0,  7,  5,  4,  8,  6,  1).  Describe  a  sequence  of  partitions  that  results 
in  a  worst-case  performance  of  Randomized-Select. 


9.3  Selection  in  worst-case  linear  time 

We  now  examine  a  selection  algorithm  whose  running  time  is  0(n )  in  the  worst 
case.  Like  Randomized-Select,  the  algorithm  Select  finds  the  desired  ele¬ 
ment  by  recursively  partitioning  the  input  array.  Here,  however,  we  guarantee  a 
good  split  upon  partitioning  the  array.  SELECT  uses  the  deterministic  partitioning 
algorithm  PARTITION  from  quicksort  (see  Section  7.1),  but  modified  to  take  the 
element  to  partition  around  as  an  input  parameter. 

The  Select  algorithm  determines  the  z'th  smallest  of  an  input  array  of  n  >  1 
distinct  elements  by  executing  the  following  steps.  (If  n  =  1,  then  Select  merely 
returns  its  only  input  value  as  the  ith  smallest.) 

1 .  Divide  the  n  elements  of  the  input  array  into  [n  / 5J  groups  of  5  elements  each 
and  at  most  one  group  made  up  of  the  remaining  n  mod  5  elements. 

2.  Find  the  median  of  each  of  the  \n/5]  groups  by  first  insertion-sorting  the  ele¬ 
ments  of  each  group  (of  which  there  are  at  most  5)  and  then  picking  the  median 
from  the  sorted  list  of  group  elements. 

3.  Use  Select  recursively  to  find  the  median  x  of  the  |7z/5"|  medians  found  in 
step  2.  (If  there  are  an  even  number  of  medians,  then  by  our  convention,  x  is 
the  lower  median.) 

4.  Partition  the  input  array  around  the  median-of-medians  x  using  the  modified 
version  of  PARTITION.  Let  k  be  one  more  than  the  number  of  elements  on  the 
low  side  of  the  partition,  so  that  x  is  the  A  th  smallest  element  and  there  are  n—k 
elements  on  the  high  side  of  the  partition. 

5.  If  i  =  k,  then  return  x.  Otherwise,  use  Select  recursively  to  find  the  /th 
smallest  element  on  the  low  side  if  z  <  k ,  or  the  (/  —  k)th  smallest  element  on 
the  high  side  if  i  >  k. 

To  analyze  the  running  time  of  Select,  we  first  determine  a  lower  bound  on  the 
number  of  elements  that  are  greater  than  the  partitioning  element  x.  Figure  9.1 
helps  us  to  visualize  this  bookkeeping.  At  least  half  of  the  medians  found  in 
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Figure  9.1  Analysis  of  the  algorithm  SELECT.  The  n  elements  are  represented  by  small  circles, 
and  each  group  of  5  elements  occupies  a  column.  The  medians  of  the  groups  are  whitened,  and  the 
median  of  medians  x  is  labeled.  (When  finding  the  median  of  an  even  number  of  elements,  we  use 
the  lower  median.)  Arrows  go  from  larger  elements  to  smaller,  from  which  we  can  see  that  3  out 
of  every  full  group  of  5  elements  to  the  right  of  x  are  greater  than  x,  and  3  out  of  every  group  of  5 
elements  to  the  left  of  x  are  less  than  .v.  The  elements  known  to  be  greater  than  x  appear  on  a  shaded 
background. 


step  2  are  greater  than  or  equal  to  the  median-of-medians  x}  Thus,  at  least  half 
of  the  |7j/5]  groups  contribute  at  least  3  elements  that  are  greater  than  x,  except 
for  the  one  group  that  has  fewer  than  5  elements  if  5  does  not  divide  n  exactly,  and 
the  one  group  containing  x  itself.  Discounting  these  two  groups,  it  follows  that  the 
number  of  elements  greater  than  x  is  at  least 

KRrnB^-*- 

Similarly,  at  least  3n/10  —  6  elements  are  less  than  x.  Thus,  in  the  worst  case, 
step  5  calls  SELECT  recursively  on  at  most  7n/10  +  6  elements. 

We  can  now  develop  a  recurrence  for  the  worst-case  running  time  T (n  )  of  the 
algorithm  Select.  Steps  1,  2,  and  4  take  0(n)  time.  (Step  2  consists  of  0(n) 
calls  of  insertion  sort  on  sets  of  size  0(1).)  Step  3  takes  time  T{\n/5\),  and  step  5 
takes  time  at  most  T(ln/\0  +  6),  assuming  that  T  is  monotonically  increasing. 
We  make  the  assumption,  which  seems  unmotivated  at  first,  that  any  input  of  fewer 
than  140  elements  requires  0(1)  time;  the  origin  of  the  magic  constant  140  will  be 
clear  shortly.  We  can  therefore  obtain  the  recurrence 


1  Because  of  our  assumption  that  the  numbers  are  distinct,  all  medians  except  x  are  either  greater 
than  or  less  than  x. 
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<  j  0(1)  if/;  <140, 

W  “  |  T(\n/5])  +  T(7n/10  +  6)  +  0(n)  if  n  >  140  . 

We  show  that  the  running  time  is  linear  by  substitution.  More  specifically,  we  will 
show  that  T(n )  <  cn  for  some  suitably  large  constant  c  and  all  n  >  0.  We  begin  by 
assuming  that  T(n)  <  cn  for  some  suitably  large  constant  c  and  all  n  <  140;  this 
assumption  holds  if  c  is  large  enough.  We  also  pick  a  constant  a  such  that  the  func¬ 
tion  described  by  the  0{n)  term  above  (which  describes  the  non-recursive  compo¬ 
nent  of  the  running  time  of  the  algorithm)  is  bounded  above  by  an  for  all  n  >  0. 
Substituting  this  inductive  hypothesis  into  the  right-hand  side  of  the  recurrence 
yields 

T(n)  <  c  \n/5]  +  c(7/i/10  +  6)  +  an 
<  cn/5  +  c  +  lcn/10  +  6c  +  an 
=  9c«/10  +  7  c  +  an 
=  cn  +  {—cn/ 10  +  7c  +  an)  , 

which  is  at  most  cn  if 


—cn/ 10  +  7c  +  an  <  0  .  (9.2) 

Inequality  (9.2)  is  equivalent  to  the  inequality  c  >  I0a{n/{n  —  70))  when  n  >  70. 
Because  we  assume  that  n  >  140,  we  have  n/(n  —  70)  <  2,  and  so  choos¬ 
ing  c  >  20a  will  satisfy  inequality  (9.2).  (Note  that  there  is  nothing  special  about 
the  constant  140;  we  could  replace  it  by  any  integer  strictly  greater  than  70  and 
then  choose  c  accordingly.)  The  worst-case  running  time  of  Select  is  therefore 
linear. 

As  in  a  comparison  sort  (see  Section  8.1),  Select  and  Randomized-Select 
determine  information  about  the  relative  order  of  elements  only  by  comparing  ele¬ 
ments.  Recall  from  Chapter  8  that  sorting  requires  Q,(n  lg  n)  time  in  the  compari¬ 
son  model,  even  on  average  (see  Problem  8-1).  The  linear-time  sorting  algorithms 
in  Chapter  8  make  assumptions  about  the  input.  In  contrast,  the  linear-time  se¬ 
lection  algorithms  in  this  chapter  do  not  require  any  assumptions  about  the  input. 
They  are  not  subject  to  the  Q(n  lg/z)  lower  bound  because  they  manage  to  solve 
the  selection  problem  without  sorting.  Thus,  solving  the  selection  problem  by  sort¬ 
ing  and  indexing,  as  presented  in  the  introduction  to  this  chapter,  is  asymptotically 
inefficient. 
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Exercises 


9.3- 1 

In  the  algorithm  Select,  the  input  elements  are  divided  into  groups  of  5.  Will 
the  algorithm  work  in  linear  time  if  they  are  divided  into  groups  of  7?  Argue  that 
Select  does  not  run  in  linear  time  if  groups  of  3  are  used. 

9.3- 2 

Analyze  SELECT  to  show  that  if  n  >  140,  then  at  least  \n/ 4]  elements  are  greater 
than  the  median-of-medians  x  and  at  least  ("/; /4]  elements  are  less  than  x. 


9.3- 3 

Show  how  quicksort  can  be  made  to  run  in  0(n  lg  n)  time  in  the  worst  case,  as¬ 
suming  that  all  elements  are  distinct. 

9.3- 4  * 

Suppose  that  an  algorithm  uses  only  comparisons  to  find  the  z'th  smallest  element 
in  a  set  of  n  elements.  Show  that  it  can  also  find  the  i  —  1  smaller  elements  and 
the  n  —  i  larger  elements  without  performing  any  additional  comparisons. 


9.3- 5 

Suppose  that  you  have  a  “black-box”  worst-case  linear-time  median  subroutine. 
Give  a  simple,  linear-time  algorithm  that  solves  the  selection  problem  for  an  arbi¬ 
trary  order  statistic. 

9.3- 6 

The  Mi  quantiles  of  an  n  -element  set  are  the  k  —  I  order  statistics  that  divide  the 
sorted  set  into  k  equal-sized  sets  (to  within  1).  Give  an  0(n  lgk)-time  algorithm 
to  list  the  kth  quantiles  of  a  set. 


9.3-7 

Describe  an  0(n)-time  algorithm  that,  given  a  set  S  of  n  distinct  numbers  and 
a  positive  integer  k  <  n,  determines  the  k  numbers  in  S  that  are  closest  to  the 
median  of  S. 


9.3-8 

Let  X[\  . .  //]  and  Y  [  1  . .  n]  be  two  arrays,  each  containing  n  numbers  already  in 
sorted  order.  Give  an  0(lg  «)-time  algorithm  to  find  the  median  of  all  In  elements 
in  arrays  X  and  Y . 


9.3-9 

Professor  Olay  is  consulting  for  an  oil  company,  which  is  planning  a  large  pipeline 
running  east  to  west  through  an  oil  field  of  n  wells.  The  company  wants  to  connect 
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Figure  9.2  Professor  Olay  needs  to  determine  the  position  of  the  east  west  oil  pipeline  that  mini 
mizes  the  total  length  of  the  north  south  spurs. 

a  spur  pipeline  from  each  well  directly  to  the  main  pipeline  along  a  shortest  route 
(either  north  or  south),  as  shown  in  Figure  9.2.  Given  the  x-  and  y -coordinates  of 
the  wells,  how  should  the  professor  pick  the  optimal  location  of  the  main  pipeline, 
which  would  be  the  one  that  minimizes  the  total  length  of  the  spurs?  Show  how  to 
determine  the  optimal  location  in  linear  time. 


Problems 


9-1  Largest  i  numbers  in  sorted  order 

Given  a  set  of  n  numbers,  we  wish  to  find  the  i  largest  in  sorted  order  using  a 
comparison-based  algorithm.  Find  the  algorithm  that  implements  each  of  the  fol¬ 
lowing  methods  with  the  best  asymptotic  worst-case  running  time,  and  analyze  the 
running  times  of  the  algorithms  in  terms  of  n  and  / . 

a.  Sort  the  numbers,  and  list  the  /  largest. 

b.  Build  a  max-priority  queue  from  the  numbers,  and  call  EXTRACT-MAX  i  times. 

c.  Use  an  order-statistic  algorithm  to  find  the  ith  largest  number,  partition  around 
that  number,  and  sort  the  i  largest  numbers. 


Problems  for  Chapter  9 
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9-2  Weighted  median 

For  n  distinct  elements  xit  x2, . . . ,  x„  with  positive  weights  W\,w2, . . .  ,wn  such 
that  i  wi  =  1 »  the  weighted  (lower)  median  is  the  element  Xk  satisfying 


xi  <xk 

and 


Xi>Xk 


For  example,  if  the  elements  are  0.1,0.35,0.05,0.1,0.15,0.05,0.2  and  each  ele¬ 
ment  equals  its  weight  (that  is,  w,  =  x,  for  /  =  1,2,...,  7),  then  the  median  is  0.1, 
but  the  weighted  median  is  0.2. 

a.  Argue  that  the  median  of  Xi,  x2,  ■  ■  . ,  x„  is  the  weighted  median  of  the  x,  with 
weights  w,  =  1/ n  for  i  =  1,2 

b.  Show  how  to  compute  the  weighted  median  of  n  elements  in  0(n  lgn)  worst- 
case  time  using  sorting. 

c.  Show  how  to  compute  the  weighted  median  in  ©(«)  worst-case  time  using  a 
linear-time  median  algorithm  such  as  Select  from  Section  9.3. 

The  post-office  location  problem  is  defined  as  follows.  We  are  given  n  points 
Pi,  p2, . . . ,  p„  with  associated  weights  uq,  w2, . . . ,  wn.  We  wish  to  find  a  point  p 
(not  necessarily  one  of  the  input  points)  that  minimizes  the  sum  i  wi  d(p,  p,), 
where  d(a,  b)  is  the  distance  between  points  a  and  b. 

d.  Argue  that  the  weighted  median  is  a  best  solution  for  the  1  -dimensional  post- 
office  location  problem,  in  which  points  are  simply  real  numbers  and  the  dis¬ 
tance  between  points  a  and  b  is  d(a,  b)  =  \a  —  b\. 

e.  Find  the  best  solution  for  the  2-dimensional  post-office  location  problem,  in 
which  the  points  are  (x,y)  coordinate  pairs  and  the  distance  between  points 
a  =  (X| ,  y  | )  and  b  =  (x2,  >’2)  is  the  Manhattan  distance  given  by  d(a,  b)  — 

l*i  -  *2 1  +  \yi  -yil 

9-3  Small  order  statistics 

We  showed  that  the  worst-case  number  T(ri)  of  comparisons  used  by  Select 
to  select  the  ith  order  statistic  from  n  numbers  satisfies  T (n)  =  0(n),  but  the 
constant  hidden  by  the  ©-notation  is  rather  large.  When  i  is  small  relative  to  n,  we 
can  implement  a  different  procedure  that  uses  Select  as  a  subroutine  but  makes 
fewer  comparisons  in  the  worst  case. 
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a.  Describe  an  algorithm  that  uses  Uj (n)  comparisons  to  find  the  i th  smallest  of  n 
elements,  where 


Udn)  = 


T(n )  if  i  >n/2, 

|_«/2J  +  Uj([n/2~})  +  T(2i)  otherwise  . 


(Hint:  Begin  with  [n  / 2J  disjoint  pairwise  comparisons,  and  recurse  on  the  set 
containing  the  smaller  element  from  each  pair.) 


b.  Show  that,  if  i  <  n/2,  then  {/,  (n)  =  n  +  0(T (2i)\g(n / /)). 


c.  Show  that  if  i  is  a  constant  less  than  n/2,  then  J7,  (n)  =  n  +  0(\g  n). 


d.  Show  that  if  i  =  n/k  for  k  >  2,  then  [/,  (n)  =  n  +  0(T (2 n / k)  lg  k). 


9-4  Alternative  analysis  of  randomized  selection 

In  this  problem,  we  use  indicator  random  variables  to  analyze  the  Randomized- 
Select  procedure  in  a  manner  akin  to  our  analysis  of  RANDOMlZED-QuiCKSORT 
in  Section  7.4.2. 

As  in  the  quicksort  analysis,  we  assume  that  all  elements  are  distinct,  and  we 
rename  the  elements  of  the  input  array  A  as  Zi,  Zi,  •  ■  ■ ,  Zn,  where  Zi  is  the  /th 
smallest  element.  Thus,  the  call  Randomized-Select(A,  1  ,n,k)  returns  Zk- 
For  1  <  i  <  j  <  n,  let 

XiJk  =  I  { Zi  is  compared  with  z.j  sometime  during  the  execution  of  the  algorithm 
to  find  Zk}  ■ 


a.  Give  an  exact  expression  for  E  [A(/^].  (Hint:  Your  expression  may  have  differ¬ 
ent  values,  depending  on  the  values  of  /,  j ,  and  k.) 

b.  Let  Xk  denote  the  total  number  of  comparisons  between  elements  of  array  A 
when  finding  Zk-  Show  that 


(k  n  1  n 

EE— rr+  E 

;=1 j=k  J  ~  j—k+1 


j  -  k-1 
j  ~  k  +  1 


+  E 


k  —  i  —  1  \ 

k  —  i  +  1  J 


c.  Show  that  E  [Xk]  <  An. 

d.  Conclude  that,  assuming  all  elements  of  array  A  are  distinct,  Randomized- 
S ELECT  runs  in  expected  time  0(n). 


Notes  for  Chapter  9 
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Chapter  notes 

The  worst-case  linear-time  median-finding  algorithm  was  devised  by  Blum,  Floyd, 
Pratt,  Rivest,  and  Tarjan  [50].  The  fast  randomized  version  is  due  to  Hoare  [169]. 
Floyd  and  Rivest  [108]  have  developed  an  improved  randomized  version  that  parti¬ 
tions  around  an  element  recursively  selected  from  a  small  sample  of  the  elements. 

It  is  still  unknown  exactly  how  many  comparisons  are  needed  to  determine  the 
median.  Bent  and  John  [41]  gave  a  lower  bound  of  2 n  comparisons  for  median 
finding,  and  Schonhage,  Paterson,  and  Pippenger  [302]  gave  an  upper  bound  of  3ti. 
Dor  and  Zwick  have  improved  on  both  of  these  bounds.  Their  upper  bound  [93] 
is  slightly  less  than  2.95 n,  and  their  lower  bound  [94]  is  (2  +  e)n,  for  a  small 
positive  constant  e,  thereby  improving  slightly  on  related  work  by  Dor  et  al.  [92]. 
Paterson  [272]  describes  some  of  these  results  along  with  other  related  work. 
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Introduction 


Sets  are  as  fundamental  to  computer  science  as  they  are  to  mathematics.  Whereas 
mathematical  sets  are  unchanging,  the  sets  manipulated  by  algorithms  can  grow, 
shrink,  or  otherwise  change  over  time.  We  call  such  sets  dynamic.  The  next  five 
chapters  present  some  basic  techniques  for  representing  finite  dynamic  sets  and 
manipulating  them  on  a  computer. 

Algorithms  may  require  several  different  types  of  operations  to  be  performed  on 
sets.  For  example,  many  algorithms  need  only  the  ability  to  insert  elements  into, 
delete  elements  from,  and  test  membership  in  a  set.  We  call  a  dynamic  set  that 
supports  these  operations  a  dictionary.  Other  algorithms  require  more  complicated 
operations.  For  example,  min-priority  queues,  which  Chapter  6  introduced  in  the 
context  of  the  heap  data  structure,  support  the  operations  of  inserting  an  element 
into  and  extracting  the  smallest  element  from  a  set.  The  best  way  to  implement  a 
dynamic  set  depends  upon  the  operations  that  must  be  supported. 

Elements  of  a  dynamic  set 

In  a  typical  implementation  of  a  dynamic  set,  each  element  is  represented  by  an 
object  whose  attributes  can  be  examined  and  manipulated  if  we  have  a  pointer  to 
the  object.  (Section  10.3  discusses  the  implementation  of  objects  and  pointers  in 
programming  environments  that  do  not  contain  them  as  basic  data  types.)  Some 
kinds  of  dynamic  sets  assume  that  one  of  the  object’s  attributes  is  an  identifying 
key.  If  the  keys  are  all  different,  we  can  think  of  the  dynamic  set  as  being  a  set 
of  key  values.  The  object  may  contain  satellite  data,  which  are  carried  around  in 
other  object  attributes  but  are  otherwise  unused  by  the  set  implementation.  It  may 
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also  have  attributes  that  are  manipulated  by  the  set  operations;  these  attributes  may 
contain  data  or  pointers  to  other  objects  in  the  set. 

Some  dynamic  sets  presuppose  that  the  keys  are  drawn  from  a  totally  ordered 
set,  such  as  the  real  numbers,  or  the  set  of  all  words  under  the  usual  alphabetic 
ordering.  A  total  ordering  allows  us  to  define  the  minimum  element  of  the  set,  for 
example,  or  to  speak  of  the  next  element  larger  than  a  given  element  in  a  set. 

Operations  on  dynamic  sets 

Operations  on  a  dynamic  set  can  be  grouped  into  two  categories:  queries,  which 
simply  return  information  about  the  set,  and  modifying  operations,  which  change 
the  set.  Here  is  a  list  of  typical  operations.  Any  specific  application  will  usually 
require  only  a  few  of  these  to  be  implemented. 

Search (S,k) 

A  query  that,  given  a  set  S  and  a  key  value  k,  returns  a  pointer  x  to  an  element 
in  S  such  that  x .key  =  k,  or  NIL  if  no  such  element  belongs  to  S. 

Insert^,  x) 

A  modifying  operation  that  augments  the  set  S  with  the  element  pointed  to 
by  x.  We  usually  assume  that  any  attributes  in  element  x  needed  by  the  set 
implementation  have  already  been  initialized. 

Delete(S\x) 

A  modifying  operation  that,  given  a  pointer  x  to  an  element  in  the  set  S,  re¬ 
moves  x  from  S.  (Note  that  this  operation  takes  a  pointer  to  an  element  x,  not 
a  key  value.) 

Minimum^) 

A  query  on  a  totally  ordered  set  S  that  returns  a  pointer  to  the  element  of  S 
with  the  smallest  key. 

Maximum  (S) 

A  query  on  a  totally  ordered  set  S  that  returns  a  pointer  to  the  element  of  S 
with  the  largest  key. 

Successor^,  x) 

A  query  that,  given  an  element  x  whose  key  is  from  a  totally  ordered  set  S, 
returns  a  pointer  to  the  next  larger  element  in  S,  or  NIL  if  x  is  the  maximum 
element. 

Predecessor^,  x) 

A  query  that,  given  an  element  x  whose  key  is  from  a  totally  ordered  set  S, 
returns  a  pointer  to  the  next  smaller  element  in  S,  or  NIL  if  x  is  the  minimum 
element. 
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In  some  situations,  we  can  extend  the  queries  SUCCESSOR  and  PREDECESSOR 
so  that  they  apply  to  sets  with  nondistinct  keys.  For  a  set  on  n  keys,  the  normal 
presumption  is  that  a  call  to  Minimum  followed  by  n  —  1  calls  to  Successor 
enumerates  the  elements  in  the  set  in  sorted  order. 

We  usually  measure  the  time  taken  to  execute  a  set  operation  in  terms  of  the  size 
of  the  set.  For  example,  Chapter  13  describes  a  data  structure  that  can  support  any 
of  the  operations  listed  above  on  a  set  of  size  n  in  time  0(\g  n). 

Overview  of  Part  III 

Chapters  10-14  describe  several  data  structures  that  we  can  use  to  implement 
dynamic  sets;  we  shall  use  many  of  these  later  to  construct  efficient  algorithms 
for  a  variety  of  problems.  We  already  saw  another  important  data  structure— the 
heap— in  Chapter  6. 

Chapter  10  presents  the  essentials  of  working  with  simple  data  structures  such 
as  stacks,  queues,  linked  lists,  and  rooted  trees.  It  also  shows  how  to  implement 
objects  and  pointers  in  programming  environments  that  do  not  support  them  as 
primitives.  If  you  have  taken  an  introductory  programming  course,  then  much  of 
this  material  should  be  familiar  to  you. 

Chapter  1 1  introduces  hash  tables,  which  support  the  dictionary  operations  IN¬ 
SERT,  Delete,  and  Search.  In  the  worst  case,  hashing  requires  0(/i)  time  to  per¬ 
form  a  Search  operation,  but  the  expected  time  for  hash-table  operations  is  0(1). 
The  analysis  of  hashing  relies  on  probability,  but  most  of  the  chapter  requires  no 
background  in  the  subject. 

Binary  search  trees,  which  are  covered  in  Chapter  12,  support  all  the  dynamic- 
set  operations  listed  above.  In  the  worst  case,  each  operation  takes  0(;i)  time  on  a 
tree  with  n  elements,  but  on  a  randomly  built  binary  search  tree,  the  expected  time 
for  each  operation  is  0(lg  n).  Binary  search  trees  serve  as  the  basis  for  many  other 
data  structures. 

Chapter  13  introduces  red-black  trees,  which  are  a  variant  of  binary  search  trees. 
Unlike  ordinary  binary  search  trees,  red-black  trees  are  guaranteed  to  perform  well: 
operations  take  0(lg  n)  time  in  the  worst  case.  A  red-black  tree  is  a  balanced  search 
tree;  Chapter  18  in  Part  V  presents  another  kind  of  balanced  search  tree,  called  a 
B-tree.  Although  the  mechanics  of  red-black  trees  are  somewhat  intricate,  you  can 
glean  most  of  their  properties  from  the  chapter  without  studying  the  mechanics  in 
detail.  Nevertheless,  you  probably  will  find  walking  through  the  code  to  be  quite 
instructive. 

In  Chapter  14,  we  show  how  to  augment  red-black  trees  to  support  operations 
other  than  the  basic  ones  listed  above.  First,  we  augment  them  so  that  we  can 
dynamically  maintain  order  statistics  for  a  set  of  keys.  Then,  we  augment  them  in 
a  different  way  to  maintain  intervals  of  real  numbers. 
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In  this  chapter,  we  examine  the  representation  of  dynamic  sets  by  simple  data  struc¬ 
tures  that  use  pointers.  Although  we  can  construct  many  complex  data  structures 
using  pointers,  we  present  only  the  rudimentary  ones:  stacks,  queues,  linked  lists, 
and  rooted  trees.  We  also  show  ways  to  synthesize  objects  and  pointers  from  ar¬ 
rays. 


10.1  Stacks  and  queues 

Stacks  and  queues  are  dynamic  sets  in  which  the  element  removed  from  the  set 
by  the  Delete  operation  is  prespecified.  In  a  stack,  the  element  deleted  from 
the  set  is  the  one  most  recently  inserted:  the  stack  implements  a  last-in,  first-out, 
or  LIFO,  policy.  Similarly,  in  a  queue,  the  element  deleted  is  always  the  one  that 
has  been  in  the  set  for  the  longest  time:  the  queue  implements  a  first-in,  first-out, 
or  FIFO ,  policy.  There  are  several  efficient  ways  to  implement  stacks  and  queues 
on  a  computer.  In  this  section  we  show  how  to  use  a  simple  array  to  implement 
each. 

Stacks 

The  Insert  operation  on  a  stack  is  often  called  Push,  and  the  Delete  opera¬ 
tion,  which  does  not  take  an  element  argument,  is  often  called  POP.  These  names 
are  allusions  to  physical  stacks,  such  as  the  spring-loaded  stacks  of  plates  used 
in  cafeterias.  The  order  in  which  plates  are  popped  from  the  stack  is  the  reverse 
of  the  order  in  which  they  were  pushed  onto  the  stack,  since  only  the  top  plate  is 
accessible. 

As  Figure  10.1  shows,  we  can  implement  a  stack  of  at  most  n  elements  with 
an  array  ,S'[1  . .  n\.  The  array  has  an  attribute  S.top  that  indexes  the  most  recently 
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Figure  10.1  An  array  implementation  of  a  stack  S .  Stack  elements  appear  only  in  the  lightly  shaded 
positions,  (a)  Stack  S  has  4  elements.  The  top  element  is  9.  (b)  Stack  S  after  the  calls  PUSH(S\  17) 
and  PUSH(5,  3).  (c)  Stack  S  after  the  call  Pop(S')  has  returned  the  element  3,  which  is  the  one  most 
recently  pushed.  Although  element  3  still  appears  in  the  array,  it  is  no  longer  in  the  stack;  the  top  is 
element  17. 

inserted  element.  The  stack  consists  of  elements  5[1 . .  S.top],  where  S[l]  is  the 
element  at  the  bottom  of  the  stack  and  5  [5.  top]  is  the  element  at  the  top. 

When  S.top  =  0,  the  stack  contains  no  elements  and  is  empty.  We  can  test  to 
see  whether  the  stack  is  empty  by  query  operation  Stack-Empty.  If  we  attempt 
to  pop  an  empty  stack,  we  say  the  stack  underflows,  which  is  normally  an  error. 
If  S.  top  exceeds  n,  the  stack  overflows.  (In  our  pseudocode  implementation,  we 
don’t  worry  about  stack  overflow.) 

We  can  implement  each  of  the  stack  operations  with  just  a  few  lines  of  code: 

Stack-Empty  (S) 

1  if  S.  top  ==  0 

2  return  TRUE 

3  else  return  FALSE 

PUSH(S,x) 

1  S.  top  =  S.  top  +  1 

2  S  [5.  top]  =  x 

POP(S) 

1  if  Stack-Empty (S) 

2  error  “underflow” 

3  else  S.top  =  S.top  —  1 

4  return  S[S.top  +  1] 

Figure  10.1  shows  the  effects  of  the  modifying  operations  PUSH  and  POP.  Each  of 
the  three  stack  operations  takes  0(1)  time. 


234 


Chapter  10  Elementary  Data  Structures 


(a) 


(b) 
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Figure  10.2  A  queue  implemented  using  an  array  Q[  1  . .  12],  Queue  elements  appear  only  in  the 
lightly  shaded  positions,  (a)  The  queue  has  5  elements,  in  locations  Q [7  . .  11].  (b)  The  configuration 
of  the  queue  after  the  calls  Enqueue! 2 . 17),  Enqueue! (9, 3),  and  Enqueue!  <2. 5).  (c)  The 
configuration  of  the  queue  after  the  call  Dequeue!  Q)  returns  the  key  value  15  formerly  at  the 
head  of  the  queue.  The  new  head  has  key  6. 


Queues 

We  call  the  Insert  operation  on  a  queue  Enqueue,  and  we  call  the  Delete 
operation  Dequeue;  like  the  stack  operation  Pop,  Dequeue  takes  no  element  ar¬ 
gument.  The  FIFO  property  of  a  queue  causes  it  to  operate  like  a  line  of  customers 
waiting  to  pay  a  cashier.  The  queue  has  a  head  and  a  tail.  When  an  element  is  en¬ 
queued,  it  takes  its  place  at  the  tail  of  the  queue,  just  as  a  newly  arriving  customer 
takes  a  place  at  the  end  of  the  line.  The  element  dequeued  is  always  the  one  at 
the  head  of  the  queue,  like  the  customer  at  the  head  of  the  line  who  has  waited  the 
longest. 

Figure  10.2  shows  one  way  to  implement  a  queue  of  at  most  n  —  1  elements 
using  an  array  Q  [1  . .  n\.  The  queue  has  an  attribute  Q.head  that  indexes,  or  points 
to,  its  head.  The  attribute  Q.tail  indexes  the  next  location  at  which  a  newly  arriv¬ 
ing  element  will  be  inserted  into  the  queue.  The  elements  in  the  queue  reside  in 
locations  Q.head.  Q.head  +  1, . . . ,  Q.tail  —  1,  where  we  “wrap  around”  in  the 
sense  that  location  1  immediately  follows  location  n  in  a  circular  order.  When 
Q .  head  =  Q .  tail,  the  queue  is  empty.  Initially,  we  have  Q .  head  =  Q .  tail  =  1 . 
If  we  attempt  to  dequeue  an  element  from  an  empty  queue,  the  queue  underflows. 
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When  Q.head  =  Q.tail  +  1,  the  queue  is  full,  and  if  we  attempt  to  enqueue  an 
element,  then  the  queue  overflows. 

In  our  procedures  Enqueue  and  Dequeue,  we  have  omitted  the  error  checking 
for  underflow  and  overflow.  (Exercise  10.1-4  asks  you  to  supply  code  that  checks 
for  these  two  error  conditions.)  The  pseudocode  assumes  that  n  =  Q.  length. 

Enqueue  (Q,x) 

1  Q[Q.tail]  =  x 

2  if  Q.tail  ==  Q .length 

3  Q.tail  =  1 

4  else  Q.tail  —  Q.tail  +  1 

Dequeue  (Q) 

1  x  —  Q[Q.head ] 

2  if  Q.head  ==  Q. length 

3  Q.head  =  1 

4  else  Q.head  =  Q.head  +  1 

5  return  x 

Figure  10.2  shows  the  effects  of  the  Enqueue  and  Dequeue  operations.  Each 
operation  takes  0(1)  time. 

Exercises 


10.1-1 

Using  Figure  10.1  as  a  model,  illustrate  the  result  of  each  operation  in  the  sequence 
Push(S\4),  Push(S\  1),  Push(S\3),  Pop(S'),  Push(S,8),  and  Pop(5')  on  an 
initially  empty  stack  S  stored  in  array  S[1 . .  6]. 


10.1-2 

Explain  how  to  implement  two  stacks  in  one  array  ^4[1  . .«]  in  such  a  way  that 
neither  stack  overflows  unless  the  total  number  of  elements  in  both  stacks  together 
is  n.  The  PUSH  and  POP  operations  should  run  in  0(1)  time. 


10.1-3 

Using  Figure  10.2  as  a  model,  illustrate  the  result  of  each  operation  in  the 
sequence  Enqueue (Q.  4),  Enqueue ( Q.  1),  Enqueue (Q.  3),  Dequeue (Q), 
Enqueue((2,  8),  and  Dequeue ((2)  on  an  initially  empty  queue  Q  stored  in 
array  Q[  1 . .  6]. 


10.1-4 

Rewrite  Enqueue  and  Dequeue  to  detect  underflow  and  overflow  of  a  queue. 
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10.1- 5 

Whereas  a  stack  allows  insertion  and  deletion  of  elements  at  only  one  end,  and  a 
queue  allows  insertion  at  one  end  and  deletion  at  the  other  end,  a  deque  (double- 
ended  queue)  allows  insertion  and  deletion  at  both  ends.  Write  four  0(  1  )-time 
procedures  to  insert  elements  into  and  delete  elements  from  both  ends  of  a  deque 
implemented  by  an  array. 

10.1- 6 

Show  how  to  implement  a  queue  using  two  stacks.  Analyze  the  running  time  of  the 
queue  operations. 

10.1-7 

Show  how  to  implement  a  stack  using  two  queues.  Analyze  the  running  time  of  the 
stack  operations. 


10.2  Linked  lists 

A  linked  list  is  a  data  structure  in  which  the  objects  are  arranged  in  a  linear  order. 
Unlike  an  array,  however,  in  which  the  linear  order  is  determined  by  the  array 
indices,  the  order  in  a  linked  list  is  determined  by  a  pointer  in  each  object.  Linked 
lists  provide  a  simple,  flexible  representation  for  dynamic  sets,  supporting  (though 
not  necessarily  efficiently)  all  the  operations  listed  on  page  230. 

As  shown  in  Figure  10.3,  each  element  of  a  doubly  linked  list  L  is  an  object  with 
an  attribute  key  and  two  other  pointer  attributes:  next  and  prev.  The  object  may 
also  contain  other  satellite  data.  Given  an  element  x  in  the  list,  x .  next  points  to  its 
successor  in  the  linked  list,  and  x.prev  points  to  its  predecessor.  If  x.prev  =  NIL, 
the  element  x  has  no  predecessor  and  is  therefore  the  first  element,  or  head,  of 
the  list.  If  x.next  =  NIL,  the  element  x  has  no  successor  and  is  therefore  the  last 
element,  or  tail,  of  the  list.  An  attribute  L .  head  points  to  the  first  element  of  the 
list.  If  L.head  =  NIL,  the  list  is  empty. 

A  list  may  have  one  of  several  forms.  It  may  be  either  singly  linked  or  doubly 
linked,  it  may  be  sorted  or  not,  and  it  may  be  circular  or  not.  If  a  list  is  singly 
linked,  we  omit  the  prev  pointer  in  each  element.  If  a  list  is  sorted,  the  linear  order 
of  the  list  corresponds  to  the  linear  order  of  keys  stored  in  elements  of  the  list;  the 
minimum  element  is  then  the  head  of  the  list,  and  the  maximum  element  is  the 
tail.  If  the  list  is  unsorted,  the  elements  can  appear  in  any  order.  In  a  circular  list, 
the  prev  pointer  of  the  head  of  the  list  points  to  the  tail,  and  the  next  pointer  of 
the  tail  of  the  list  points  to  the  head.  We  can  think  of  a  circular  list  as  a  ring  of 
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prev  key  next 

\  I  / 


(a)  L.head 


Figure  10.3  (a)  A  doubly  linked  list  L  representing  the  dynamic  set  {1.4.  9. 16}.  Each  element  in 
the  list  is  an  object  with  attributes  for  the  key  and  pointers  (shown  by  arrows)  to  the  next  and  previous 
objects.  The  next  attribute  of  the  tail  and  the  prev  attribute  of  the  head  are  NIL,  indicated  by  a  diagonal 
slash.  The  attribute  L.head  points  to  the  head,  (b)  Following  the  execution  of  List  INSERT(L,x), 
where  x.key  =  25,  the  linked  list  has  a  new  object  with  key  25  as  the  new  head.  This  new  object 
points  to  the  old  head  with  key  9.  (c)  The  result  of  the  subsequent  call  LIST  DELETE}/,,  x),  where  x 
points  to  the  object  with  key  4. 


elements.  In  the  remainder  of  this  section,  we  assume  that  the  lists  with  which  we 
are  working  are  unsorted  and  doubly  linked. 

Searching  a  linked  list 

The  procedure  List-Search (L,k)  finds  the  first  element  with  key  k  in  list  L 
by  a  simple  linear  search,  returning  a  pointer  to  this  element.  If  no  object  with 
key  k  appears  in  the  list,  then  the  procedure  returns  NIL.  For  the  linked  list  in 
Figure  10.3(a),  the  call  LIST-SEARCH (L,  4)  returns  a  pointer  to  the  third  element, 
and  the  call  List-Search  (L,  7)  returns  nil. 

List-Search(L,£) 

1  x  =  L.head 

2  while  x  ^  NIL  and  x.key  ^  k 

3  x  =  x.next 

4  return  x 

To  search  a  list  of  n  objects,  the  List-Search  procedure  takes  0(n)  time  in  the 
worst  case,  since  it  may  have  to  search  the  entire  list. 

Inserting  into  a  linked  list 

Given  an  element  x  whose  key  attribute  has  already  been  set,  the  List-Insert 
procedure  “splices”  x  onto  the  front  of  the  linked  list,  as  shown  in  Figure  10.3(b). 
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List-Insert  (L,x) 

1  x.next  =  L.head 

2  if  L.  head  ^  NIL 

3  L.head.prev  =  x 

4  L.head  =  x 

5  x.prev  =  NIL 

(Recall  that  our  attribute  notation  can  cascade,  so  that  L.head.prev  denotes  the 
prev  attribute  of  the  object  that  L.head  points  to.)  The  running  time  for  LlST- 
INSERT  on  a  list  of  n  elements  is  0(1). 

Deleting  from  a  linked  list 

The  procedure  List-Delete  removes  an  element  x  from  a  linked  list  L.  It  must 
be  given  a  pointer  to  x,  and  it  then  “splices”  x  out  of  the  list  by  updating  pointers. 
If  we  wish  to  delete  an  element  with  a  given  key,  we  must  first  call  List-Search 
to  retrieve  a  pointer  to  the  element. 

List-Delete  (L,x) 

1  if  x.prev  ^  NIL 

2  x.prev.  next  =  x.next 

3  else  L.head  =  x.next 

4  if  x .  next  ^  NIL 

5  x.next.  prev  =  x.prev 

Figure  10.3(c)  shows  how  an  element  is  deleted  from  a  linked  list.  List-Delete 
runs  in  0(1)  time,  but  if  we  wish  to  delete  an  element  with  a  given  key,  (~)(n)  time 
is  required  in  the  worst  case  because  we  must  first  call  List-Search  to  find  the 
element. 

Sentinels 

The  code  for  List-Delete  would  be  simpler  if  we  could  ignore  the  boundary 
conditions  at  the  head  and  tail  of  the  list: 

List-Delete'  ( L.x ) 

1  x.prev.  next  =  x.next 

2  x  .next,  prev  =  x.prev 

A  sentinel  is  a  dummy  object  that  allows  us  to  simplify  boundary  conditions.  For 
example,  suppose  that  we  provide  with  list  L  an  object  L.nil  that  represents  NIL 
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(a) 

(b) 


(c) 


(d) 


Figure  10.4  A  circular,  doubly  linked  list  with  a  sentinel.  The  sentinel  L.nil  appears  between  the 
head  and  tail.  The  attribute  L.head  is  no  longer  needed,  since  we  can  access  the  head  of  the  list 
by  L.nil. next,  (a)  An  empty  list,  (b)  The  linked  list  from  Figure  10.3(a),  with  key  9  at  the  head  and 
key  1  at  the  tail,  (c)  The  list  after  executing  List  INSERT7 (L,  x),  where  x.key  =  25.  The  new  object 
becomes  the  head  of  the  list,  (d)  The  list  after  deleting  the  object  with  key  1.  The  new  tail  is  the 
object  with  key  4. 

but  has  all  the  attributes  of  the  other  objects  in  the  list.  Wherever  we  have  a  ref¬ 
erence  to  NIL  in  list  code,  we  replace  it  by  a  reference  to  the  sentinel  L.nil.  As 
shown  in  Figure  10.4,  this  change  turns  a  regular  doubly  linked  list  into  a  circu¬ 
lar,  doubly  linked  list  with  a  sentinel ,  in  which  the  sentinel  L .  nil  lies  between  the 
head  and  tail.  The  attribute  L.nil. next  points  to  the  head  of  the  list,  and  L.nil.prev 
points  to  the  tail.  Similarly,  both  the  next  attribute  of  the  tail  and  the  prev  at¬ 
tribute  of  the  head  point  to  L.nil.  Since  L.nil. next  points  to  the  head,  we  can 
eliminate  the  attribute  L .  head  altogether,  replacing  references  to  it  by  references 
to  L.nil. next.  Figure  10.4(a)  shows  that  an  empty  list  consists  of  just  the  sentinel, 
and  both  L.nil. next  and  L.nil.prev  point  to  L.nil. 

The  code  for  List-Search  remains  the  same  as  before,  but  with  the  references 
to  NIL  and  L.head  changed  as  specified  above: 

List-Search'  (L.k) 

1  x  =  L.nil. next 

2  while  x  /  L. nil  and  % . key  ^  k 

3  x  =  x.next 

4  return  x 

We  use  the  two-line  procedure  List-Delete'  from  before  to  delete  an  element 
from  the  list.  The  following  procedure  inserts  an  element  into  the  list: 
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List-Insert'  (L,x) 

1  x.next  =  L.  nil.  next 

2  L.  nil.  next,  prev  =  x 

3  L.  nil.  next  =  x 

4  x.prev  =  L.nil 

Figure  10.4  shows  the  effects  of  List-Insert'  and  List-Delete'  on  a  sample  list. 

Sentinels  rarely  reduce  the  asymptotic  time  bounds  of  data  structure  operations, 
but  they  can  reduce  constant  factors.  The  gain  from  using  sentinels  within  loops 
is  usually  a  matter  of  clarity  of  code  rather  than  speed;  the  linked  list  code,  for 
example,  becomes  simpler  when  we  use  sentinels,  but  we  save  only  0(1)  time  in 
the  List-Insert'  and  List-Delete'  procedures.  In  other  situations,  however,  the 
use  of  sentinels  helps  to  tighten  the  code  in  a  loop,  thus  reducing  the  coefficient  of, 
say,  n  or  n2  in  the  running  time. 

We  should  use  sentinels  judiciously.  When  there  are  many  small  lists,  the  extra 
storage  used  by  their  sentinels  can  represent  significant  wasted  memory.  In  this 
book,  we  use  sentinels  only  when  they  truly  simplify  the  code. 

Exercises 


10.2-1 

Can  you  implement  the  dynamic-set  operation  INSERT  on  a  singly  linked  list 
in  0(1)  time?  How  about  Delete? 

10.2-2 

Implement  a  stack  using  a  singly  linked  list  L.  The  operations  PUSH  and  POP 
should  still  take  0(1)  time. 


10.2-3 

Implement  a  queue  by  a  singly  linked  list  L.  The  operations  Enqueue  and  De¬ 
queue  should  still  take  0(1)  time. 


10.2-4 

As  written,  each  loop  iteration  in  the  List-Search'  procedure  requires  two  tests: 
one  for  x  ^  L.nil  and  one  for  x.key  ^  k.  Show  how  to  eliminate  the  test  for 
i  /  I. nil  in  each  iteration. 


10.2-5 

Implement  the  dictionary  operations  Insert,  Delete,  and  Search  using  singly 
linked,  circular  lists.  What  are  the  running  times  of  your  procedures? 
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10.2-6 

The  dynamic-set  operation  UNION  takes  two  disjoint  sets  .Sj  and  S2  as  input,  and 
it  returns  a  set  S  =  Si  U  S2  consisting  of  all  the  elements  of  .Sj  and  S2-  The 
sets  Si  and  S2  are  usually  destroyed  by  the  operation.  Show  how  to  support  UNION 
in  0(1)  time  using  a  suitable  list  data  structure. 


10.2- 7 

Give  a  ©(n)-time  nonrecursive  procedure  that  reverses  a  singly  linked  list  of  n 
elements.  The  procedure  should  use  no  more  than  constant  storage  beyond  that 
needed  for  the  list  itself. 

10.2- 8  * 

Explain  how  to  implement  doubly  linked  lists  using  only  one  pointer  value  x .  np  per 
item  instead  of  the  usual  two  {next  and  prev).  Assume  that  all  pointer  values  can  be 
interpreted  as  /r-bit  integers,  and  define  x.np  to  be  x.np  =  x.next  XOR  x.prev, 
the  /r-bit  “exclusive-or”  of  x.next  and  x.prev.  (The  value  NIL  is  represented  by  0.) 
Be  sure  to  describe  what  information  you  need  to  access  the  head  of  the  list.  Show 
how  to  implement  the  Search,  Insert,  and  Delete  operations  on  such  a  list. 
Also  show  how  to  reverse  such  a  list  in  0(1)  time. 


10.3  Implementing  pointers  and  objects 

How  do  we  implement  pointers  and  objects  in  languages  that  do  not  provide  them? 
In  this  section,  we  shall  see  two  ways  of  implementing  linked  data  structures  with¬ 
out  an  explicit  pointer  data  type.  We  shall  synthesize  objects  and  pointers  from 
arrays  and  array  indices. 

A  multiple-array  representation  of  objects 

We  can  represent  a  collection  of  objects  that  have  the  same  attributes  by  using  an 
array  for  each  attribute.  As  an  example,  Figure  10.5  shows  how  we  can  implement 
the  linked  list  of  Figure  10.3(a)  with  three  arrays.  The  array  key  holds  the  values 
of  the  keys  currently  in  the  dynamic  set,  and  the  pointers  reside  in  the  arrays  next 
and  prev.  For  a  given  array  index  x,  the  array  entries  key[x],  next [x],  and  prev[x] 
represent  an  object  in  the  linked  list.  Under  this  interpretation,  a  pointer  x  is  simply 
a  common  index  into  the  key,  next,  and  prev  arrays. 

In  Figure  10.3(a),  the  object  with  key  4  follows  the  object  with  key  16  in  the 
linked  list.  In  Figure  10.5,  key  4  appeal's  in  key[ 2],  and  key  16  appears  in  key[ 5], 
and  so  next[ 5]  =  2  and  prev[ 2]  =  5.  Although  the  constant  NIL  appears  in  the  next 
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Figure  10.5  The  linked  list  of  Figure  10.3(a)  represented  by  the  arrays  key,  next,  and  prev.  Each 
vertical  slice  of  the  arrays  represents  a  single  object.  Stored  pointers  correspond  to  the  array  indices 
shown  at  the  top;  the  arrows  show  how  to  interpret  them.  Lightly  shaded  object  positions  contain  list 
elements.  The  variable  L  keeps  the  index  of  the  head. 

attribute  of  the  tail  and  the  prev  attribute  of  the  head,  we  usually  use  an  integer 
(such  as  0  or  —1)  that  cannot  possibly  represent  an  actual  index  into  the  arrays.  A 
variable  L  holds  the  index  of  the  head  of  the  list. 

A  single-array  representation  of  objects 

The  words  in  a  computer  memory  are  typically  addressed  by  integers  from  0 
to  M  —  1,  where  M  is  a  suitably  large  integer.  In  many  programming  languages, 
an  object  occupies  a  contiguous  set  of  locations  in  the  computer  memory.  A  pointer 
is  simply  the  address  of  the  first  memory  location  of  the  object,  and  we  can  address 
other  memory  locations  within  the  object  by  adding  an  offset  to  the  pointer. 

We  can  use  the  same  strategy  for  implementing  objects  in  programming  envi¬ 
ronments  that  do  not  provide  explicit  pointer  data  types.  For  example.  Figure  10.6 
shows  how  to  use  a  single  array  A  to  store  the  linked  list  from  Figures  10.3(a) 
and  10.5.  An  object  occupies  a  contiguous  subarray  A[j  .  ,k\  Each  attribute  of 
the  object  corresponds  to  an  offset  in  the  range  from  0  to  k  —  j ,  and  a  pointer  to 
the  object  is  the  index  j .  In  Figure  10.6,  the  offsets  corresponding  to  key,  next,  and 
prev  are  0,  1,  and  2,  respectively.  To  read  the  value  of  i.prev,  given  a  pointer  i,  we 
add  the  value  i  of  the  pointer  to  the  offset  2,  thus  reading  A[i  +  2], 

The  single-array  representation  is  flexible  in  that  it  permits  objects  of  different 
lengths  to  be  stored  in  the  same  array.  The  problem  of  managing  such  a  heteroge¬ 
neous  collection  of  objects  is  more  difficult  than  the  problem  of  managing  a  homo¬ 
geneous  collection,  where  all  objects  have  the  same  attributes.  Since  most  of  the 
data  structures  we  shall  consider  are  composed  of  homogeneous  elements,  it  will 
be  sufficient  for  our  purposes  to  use  the  multiple-array  representation  of  objects. 
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Figure  10.6  The  linked  list  of  Figures  10.3(a)  and  10.5  represented  in  a  single  array  A.  Each  list 
element  is  an  object  that  occupies  a  contiguous  subarray  of  length  3  within  the  array.  The  three 
attributes  key,  next,  and  prev  correspond  to  the  offsets  0,  1,  and  2,  respectively,  within  each  object. 
A  pointer  to  an  object  is  the  index  of  the  first  element  of  the  object.  Objects  containing  list  elements 
are  lightly  shaded,  and  arrows  show  the  list  ordering. 

Allocating  and  freeing  objects 

To  insert  a  key  into  a  dynamic  set  represented  by  a  doubly  linked  list,  we  must  al¬ 
locate  a  pointer  to  a  currently  unused  object  in  the  linked-list  representation.  Thus, 
it  is  useful  to  manage  the  storage  of  objects  not  currently  used  in  the  linked-list 
representation  so  that  one  can  be  allocated.  In  some  systems,  a  garbage  collec¬ 
tor  is  responsible  for  determining  which  objects  are  unused.  Many  applications, 
however,  are  simple  enough  that  they  can  bear  responsibility  for  returning  an  un¬ 
used  object  to  a  storage  manager.  We  shall  now  explore  the  problem  of  allocating 
and  freeing  (or  deallocating)  homogeneous  objects  using  the  example  of  a  doubly 
linked  list  represented  by  multiple  arrays. 

Suppose  that  the  arrays  in  the  multiple-array  representation  have  length  m  and 
that  at  some  moment  the  dynamic  set  contains  n  <  m  elements.  Then  n  objects 
represent  elements  currently  in  the  dynamic  set,  and  the  remaining  m—n  objects  are 
free,  the  free  objects  are  available  to  represent  elements  inserted  into  the  dynamic 
set  in  the  future. 

We  keep  the  free  objects  in  a  singly  linked  list,  which  we  call  the  free  list.  The 
free  list  uses  only  the  next  array,  which  stores  the  next  pointers  within  the  list. 
The  head  of  the  free  list  is  held  in  the  global  variable  free.  When  the  dynamic 
set  represented  by  linked  list  L  is  nonempty,  the  free  list  may  be  intertwined  with 
list  L,  as  shown  in  Figure  10.7.  Note  that  each  object  in  the  representation  is  either 
in  list  L  or  in  the  free  list,  but  not  in  both. 

The  free  list  acts  like  a  stack:  the  next  object  allocated  is  the  last  one  freed.  We 
can  use  a  list  implementation  of  the  stack  operations  PUSH  and  POP  to  implement 
the  procedures  for  allocating  and  freeing  objects,  respectively.  We  assume  that  the 
global  variable  free  used  in  the  following  procedures  points  to  the  first  element  of 
the  free  list. 
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Figure  10.7  The  effect  of  the  Allocate  Object  and  Free  Object  procedures,  (a)  The  list 
of  Figure  10.5  (lightly  shaded)  and  a  free  list  (heavily  shaded).  Arrows  show  the  free  list  structure, 
(b)  The  result  of  calling  ALLOCATE  OBJECT()  (which  returns  index  4),  setting  key\4\  to  25,  and 
calling  LIST  Insert(L,4).  The  new  free  list  head  is  object  8,  which  had  been  next[ 4]  on  the  free 
list,  (c)  After  executing  List  Delete(L,5),  we  call  Free  Object(5).  Object  5  becomes  the  new 
free  list  head,  with  object  8  following  it  on  the  free  list. 


Allocate-Object() 

1  if  free  ==  NIL 

2  error  “out  of  space” 

3  else  x  =  free 

4  free  =  x .  next 

5  return  x 

Free-Object(jc) 

1  x.next  =  free 

2  free  =  x 

The  free  list  initially  contains  all  n  unallocated  objects.  Once  the  free  list  has  been 
exhausted,  running  the  Allocate-Object  procedure  signals  an  error.  We  can 
even  service  several  linked  lists  with  just  a  single  free  list.  Figure  10.8  shows  two 
linked  lists  and  a  free  list  intertwined  through  key,  next,  and  prev  arrays. 

The  two  procedures  run  in  0(1)  time,  which  makes  them  quite  practical.  We 
can  modify  them  to  work  for  any  homogeneous  collection  of  objects  by  letting  any 
one  of  the  attributes  in  the  object  act  like  a  next  attribute  in  the  free  list. 
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Figure  10.8  Two  linked  lists,  L  \  (lightly  shaded)  and  Li  (heavily  shaded),  and  a  free  list  (dark 
ened)  intertwined. 

Exercises 


10.3- 1 

Draw  a  picture  of  the  sequence  (13,  4,  8,  19,  5,  11)  stored  as  a  doubly  linked  list 
using  the  multiple- array  representation.  Do  the  same  for  the  single-array  represen¬ 
tation. 

10.3- 2 

Write  the  procedures  ALLOCATE-OBJECT  and  FREE-OBJECT  for  a  homogeneous 
collection  of  objects  implemented  by  the  single-array  representation. 

10.3- 3 

Why  don’t  we  need  to  set  or  reset  the  prev  attributes  of  objects  in  the  implementa¬ 
tion  of  the  Allocate-Object  and  FREE-OBJECT  procedures? 

10.3- 4 

It  is  often  desirable  to  keep  all  elements  of  a  doubly  linked  list  compact  in  storage, 
using,  for  example,  the  first  m  index  locations  in  the  multiple-array  representation. 
(This  is  the  case  in  a  paged,  virtual-memory  computing  environment.)  Explain 
how  to  implement  the  procedures  ALLOCATE-OBJECT  and  FREE-OBJECT  so  that 
the  representation  is  compact.  Assume  that  there  are  no  pointers  to  elements  of  the 
linked  list  outside  the  list  itself.  (Hint:  Use  the  array  implementation  of  a  stack.) 


10.3-5 

Let  L  be  a  doubly  linked  list  of  length  n  stored  in  arrays  key ,  prev,  and  next  of 
length  m.  Suppose  that  these  arrays  are  managed  by  ALLOCATE-OBJECT  and 
Free-Object  procedures  that  keep  a  doubly  linked  free  list  F .  Suppose  further 
that  of  the  m  items,  exactly  n  are  on  list  L  and  m  —  n  are  on  the  free  list.  Write 
a  procedure  COMPACTIFY-LlST(L.  F)  that,  given  the  list  L  and  the  free  list  F , 

moves  the  items  in  L  so  that  they  occupy  array  positions  1,2 . n  and  adjusts  the 

free  list  F  so  that  it  remains  correct,  occupying  array  positions  n  + 1,  n  +2, . . . ,  m. 
The  running  time  of  your  procedure  should  be  &(n),  and  it  should  use  only  a 
constant  amount  of  extra  space.  Argue  that  your  procedure  is  correct. 
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10.4  Representing  rooted  trees 

The  methods  for  representing  lists  given  in  the  previous  section  extend  to  any  ho¬ 
mogeneous  data  structure.  In  this  section,  we  look  specifically  at  the  problem  of 
representing  rooted  trees  by  linked  data  structures.  We  first  look  at  binary  trees, 
and  then  we  present  a  method  for  rooted  trees  in  which  nodes  can  have  an  arbitrary 
number  of  children. 

We  represent  each  node  of  a  tree  by  an  object.  As  with  linked  lists,  we  assume 
that  each  node  contains  a  key  attribute.  The  remaining  attributes  of  interest  are 
pointers  to  other  nodes,  and  they  vary  according  to  the  type  of  tree. 

Binary  trees 

Figure  10.9  shows  how  we  use  the  attributes  p ,  left ,  and  right  to  store  pointers  to 
the  parent,  left  child,  and  right  child  of  each  node  in  a  binary  tree  T.  If  x.p  =  NIL, 
then  x  is  the  root.  If  node  x  has  no  left  child,  then  x.left  =  NIL,  and  similarly  for 
the  right  child.  The  root  of  the  entire  tree  T  is  pointed  to  by  the  attribute  T.  root.  If 
T.root  =  NIL,  then  the  tree  is  empty. 

Rooted  trees  with  unbounded  branching 

We  can  extend  the  scheme  for  representing  a  binary  tree  to  any  class  of  trees  in 
which  the  number  of  children  of  each  node  is  at  most  some  constant  k :  we  replace 
the  left  and  right  attributes  by  child i,child2, . . . , child k-  This  scheme  no  longer 
works  when  the  number  of  children  of  a  node  is  unbounded,  since  we  do  not  know 
how  many  attributes  (arrays  in  the  multiple-array  representation)  to  allocate  in  ad¬ 
vance.  Moreover,  even  if  the  number  of  children  k  is  bounded  by  a  large  constant 
but  most  nodes  have  a  small  number  of  children,  we  may  waste  a  lot  of  memory. 

Fortunately,  there  is  a  clever  scheme  to  represent  trees  with  arbitrary  numbers  of 
children.  It  has  the  advantage  of  using  only  O(n)  space  for  any  //-node  rooted  tree. 
The  left-child,  right-sibling  representation  appears  in  Figure  10.10.  As  before, 
each  node  contains  a  parent  pointer  p,  and  T.root  points  to  the  root  of  tree  T . 
Instead  of  having  a  pointer  to  each  of  its  children,  however,  each  node  x  has  only 
two  pointers: 

1 .  x.  left-child  points  to  the  leftmost  child  of  node  x,  and 

2.  x .right-sibling  points  to  the  sibling  of  x  immediately  to  its  right. 

If  node  x  has  no  children,  then  x.  left-child  =  NIL,  and  if  node  x  is  the  rightmost 
child  of  its  parent,  then  x .right -sibling  =  NIL. 


10.4  Representing  rooted  trees 


247 


Figure  10.9  The  representation  of  a  binary  tree  T.  Each  node  x  has  the  attributes  x  .p  (top),  x.left 
(lower  left),  and  x .  right  (lower  right).  The  key  attributes  are  not  shown. 


Figure  10.10  The  left  child,  right  sibling  representation  of  atree  T.  Each  node  .v  has  attributes  x.p 
(top),  x.left  child  (lower  left),  and  x . right  sibling  (lower  right).  The  key  attributes  are  not  shown. 
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Other  tree  representations 

We  sometimes  represent  rooted  trees  in  other  ways.  In  Chapter  6,  for  example, 
we  represented  a  heap,  which  is  based  on  a  complete  binary  tree,  by  a  single  array 
plus  the  index  of  the  last  node  in  the  heap.  The  trees  that  appear  in  Chapter  21  are 
traversed  only  toward  the  root,  and  so  only  the  parent  pointers  are  present;  there 
are  no  pointers  to  children.  Many  other  schemes  are  possible.  Which  scheme  is 
best  depends  on  the  application. 

Exercises 


10.4-1 

Draw  the  binary  tree  rooted  at  index  6  that  is  represented  by  the  following  at¬ 
tributes: 


index  key  left  right 


1 

12 

7 

3 

2 

15 

8 

NIL 

3 

4 

10 

NIL 

4 

10 

5 

9 

5 

2 

NIL 

NIL 

6 

18 

i 

4 

7 

7 

NIL 

NIL 

8 

14 

6 

2 

9 

21 

NIL 

NIL 

10 

5 

NIL 

NIL 

10.4-2 


Write  an  0  (ft) -time  recursive  procedure  that,  given  an  n-node  binary  tree,  prints 
out  the  key  of  each  node  in  the  tree. 


10.4-3 

Write  an  (9(«)-timc  nonrecursive  procedure  that,  given  an  /i-node  binary  tree, 
prints  out  the  key  of  each  node  in  the  tree.  Use  a  stack  as  an  auxiliary  data  structure. 


10.4-4 

Write  an  0(n) -time  procedure  that  prints  all  the  keys  of  an  arbitrary  rooted  tree 
with  n  nodes,  where  the  tree  is  stored  using  the  left-child,  right-sibling  representa¬ 
tion. 


10.4-5  * 

Write  an  0(ii)-time  nonrecursive  procedure  that,  given  an  /i-node  binary  tree, 
prints  out  the  key  of  each  node.  Use  no  more  than  constant  extra  space  outside 
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of  the  tree  itself  and  do  not  modify  the  tree,  even  temporarily,  during  the  proce¬ 
dure. 


10.4-6  * 

The  left-child,  right-sibling  representation  of  an  arbitrary  rooted  tree  uses  three 
pointers  in  each  node:  left-child,  right-sibling,  and  parent.  From  any  node,  its 
parent  can  be  reached  and  identified  in  constant  time  and  all  its  children  can  be 
reached  and  identified  in  time  1  i  near  in  the  number  of  children.  Show  how  to  use 
only  two  pointers  and  one  boolean  value  in  each  node  so  that  the  parent  of  a  node 
or  all  of  its  children  can  be  reached  and  identified  in  time  linear  in  the  number  of 
children. 


Problems 


10-1  Comparisons  among  lists 

For  each  of  the  four  types  of  lists  in  the  following  table,  what  is  the  asymptotic 
worst-case  running  time  for  each  dynamic-set  operation  listed? 


unsorted, 

singly 

linked 

sorted, 

singly 

linked 

unsorted, 

doubly 

linked 

sorted, 

doubly 

linked 

SEARCH(L,  /:) 

Insert  (L,x) 

Delete  (L.jc) 

Successor^,  x) 

Predecessor(L,  x) 

Minimum  (L) 

Maximum(L) 
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10-2  Mergeable  heaps  using  linked  lists 

A  mergeable  heap  supports  the  following  operations:  Make-Heap  (which  creates 
an  empty  mergeable  heap),  Insert,  Minimum,  Extract-Min,  and  Union.1 
Show  how  to  implement  mergeable  heaps  using  linked  lists  in  each  of  the  following 
cases.  Try  to  make  each  operation  as  efficient  as  possible.  Analyze  the  running 
time  of  each  operation  in  terms  of  the  size  of  the  dynamic  set(s)  being  operated  on. 

a.  Lists  are  sorted. 

b.  Lists  are  unsorted. 

c.  Lists  are  unsorted,  and  dynamic  sets  to  be  merged  are  disjoint. 

10-3  Searching  a  sorted  compact  list 

Exercise  10.3-4  asked  how  we  might  maintain  an  n -element  list  compactly  in  the 
first  n  positions  of  an  array.  We  shall  assume  that  all  keys  are  distinct  and  that  the 
compact  list  is  also  sorted,  that  is,  key[i]  <  key[next[i]\  for  all  i  =  1,2, ...  ,n  such 
that  next[i]  ^  NIL.  We  will  also  assume  that  we  have  a  variable  L  that  contains 
the  index  of  the  first  element  on  the  list.  Under  these  assumptions,  you  will  show 
that  we  can  use  the  following  randomized  algorithm  to  search  the  list  in  0{y/n) 
expected  time. 

Compact-List-Search  ( L.n,k ) 

1  i  =  L 

2  while  i  ^  NIL  and  key[i]  <  k 

3  j  =  Random(1,/i) 

4  if  key  [/]  <  key  [j  ]  and  key  [j  ]  <  k 

5  i  =  j 

6  if  key[i]==k 

7  return  i 

8  i  =  next[i] 

9  if  i  ==  NIL  or  key[i]  >  k 

10  return  NIL 

1 1  else  return  i 

If  we  ignore  lines  3-7  of  the  procedure,  we  have  an  ordinary  algorithm  for 
searching  a  sorted  linked  list,  in  which  index  i  points  to  each  position  of  the  list  in 


because  we  have  defined  a  mergeable  heap  to  support  Minimum  and  EXTRACT  Min,  we  can  also 
refer  to  it  as  a  mergeable  min-heap.  Alternatively,  if  it  supported  Maximum  and  EXTRACT  Max, 
it  would  be  a  mergeable  max-heap. 
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turn.  The  search  terminates  once  the  index  i  “falls  off”  the  end  of  the  list  or  once 
key[i]  >  k.  In  the  latter  case,  if  key[i]  =  k,  clearly  we  have  found  a  key  with  the 
value  k.  If,  however,  key[i ]  >  k,  then  we  will  never  find  a  key  with  the  value  k, 
and  so  terminating  the  search  was  the  right  thing  to  do. 

Lines  3-7  attempt  to  skip  ahead  to  a  randomly  chosen  position  j .  Such  a  skip 
benefits  us  if  key[j]  is  larger  than  key[i]  and  no  larger  than  k\  in  such  a  case,  j 
marks  a  position  in  the  list  that  i  would  have  to  reach  during  an  ordinary  list  search. 
Because  the  list  is  compact,  we  know  that  any  choice  of  j  between  1  and  n  indexes 
some  object  in  the  list  rather  than  a  slot  on  the  free  list. 

Instead  of  analyzing  the  performance  of  Compact-List-Search  directly,  we 
shall  analyze  a  related  algorithm,  COMPACT-List-Search',  which  executes  two 
separate  loops.  This  algorithm  takes  an  additional  parameter  t  which  determines 
an  upper  bound  on  the  number  of  iterations  of  the  first  loop. 

Compact-List-Search'  (L,n,k,t) 

1  i  =  L 

2  for  q  =  1  to  t 

3  j  —  Random(1,») 

4  if  key[i]  <  key[j  ]  and  key[j  ]  <  k 

5  i  =  j 

6  if  key[i]==k 

7  return  i 

8  while  i  ^  NIL  and  key[i]  <  k 

9  i  =  next  [/  ] 

10  if  i  ==  NIL  or  key[i]  >  k 

1 1  return  NIL 

1 2  else  return  i 

To  compare  the  execution  of  the  algorithms  Compact-List-Search(L,  n,  k) 
and  Compact-List-Search'CL,  n,  k,  t ),  assume  that  the  sequence  of  integers  re¬ 
turned  by  the  calls  of  Random(1,  n)  is  the  same  for  both  algorithms. 

a.  Suppose  that  Compact-List-Search (L,n,k)  takes  t  iterations  of  the  while 
loop  of  lines  2-8.  Argue  that  Compact-List-Search'CL,  n,  k,  t)  returns  the 
same  answer  and  that  the  total  number  of  iterations  of  both  the  for  and  while 
loops  within  Compact-List-Search'  is  at  least  t. 

In  the  call  COMPACT-LlST-SEARCH'jL,  n,  k,  t ),  let  Xt  be  the  random  variable  that 
describes  the  distance  in  the  linked  list  (that  is,  through  the  chain  of  next  pointers) 
from  position  i  to  the  desired  key  k  after  l  iterations  of  the  for  loop  of  lines  2-7 
have  occurred. 
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b.  Argue  that  the  expected  running  time  of  C  O  M  PA  CT-LlST-S  E  A  R  C  h'(L  ,n.kj) 
is  0(t  +  E  [Z(]). 

c.  Show  that  E  [X,]  <  Y"=  i  * '  —  r/n)‘.  {Hint:  Use  equation  (C.25).) 

d.  Show  that  Y"=o  r‘  —  >it+1/U  +  !)• 

e.  Prove  that  E  [X,\  <  n/ (t  +  1). 

/.  Show  that  Compact-List-Search  (L ,n,k,t)  runs  in  0(t  +  n/t)  expected 
time. 

g.  Conclude  that  Compact-List-Search  runs  in  0{y/n)  expected  time. 

h.  Why  do  we  assume  that  all  keys  are  distinct  in  Compact-List-Search?  Ar¬ 
gue  that  random  skips  do  not  necessarily  help  asymptotically  when  the  list  con¬ 
tains  repeated  key  values. 


Chapter  notes 

Aho,  Hopcroft,  and  Ullman  [6]  and  Knuth  [209]  are  excellent  references  for  ele¬ 
mentary  data  structures.  Many  other  texts  cover  both  basic  data  structures  and  their 
implementation  in  a  particular-  programming  language.  Examples  of  these  types  of 
textbooks  include  Goodrich  and  Tamassia  [147],  Main  [241],  Shaffer  [311],  and 
Weiss  [352,  353,  354].  Gonnet  [145]  provides  experimental  data  on  the  perfor¬ 
mance  of  many  data-structure  operations. 

The  origin  of  stacks  and  queues  as  data  structures  in  computer  science  is  un¬ 
clear,  since  corresponding  notions  already  existed  in  mathematics  and  paper-based 
business  practices  before  the  introduction  of  digital  computers.  Knuth  [209]  cites 
A.  M.  Turing  for  the  development  of  stacks  for  subroutine  linkage  in  1947. 

Pointer-based  data  structures  also  seem  to  be  a  folk  invention.  According  to 
Knuth,  pointers  were  apparently  used  in  early  computers  with  drum  memories.  The 
A-l  language  developed  by  G.  M.  Hopper  in  1951  represented  algebraic  formulas 
as  binary  trees.  Knuth  credits  the  IPL-II  language,  developed  in  1956  by  A.  Newell, 
J.  C.  Shaw,  and  H.  A.  Simon,  for  recognizing  the  importance  and  promoting  the 
use  of  pointers.  Then-  IPL-III  language,  developed  in  1957,  included  explicit  stack 
operations. 
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Many  applications  require  a  dynamic  set  that  supports  only  the  dictionary  opera¬ 
tions  Insert,  Search,  and  Delete.  For  example,  a  compiler  that  translates  a 
programming  language  maintains  a  symbol  table,  in  which  the  keys  of  elements 
are  arbitrary  character  strings  corresponding  to  identifiers  in  the  language.  A  hash 
table  is  an  effective  data  structure  for  implementing  dictionaries.  Although  search¬ 
ing  for  an  element  in  a  hash  table  can  take  as  long  as  searching  for  an  element  in  a 
linked  list— 0(n)  time  in  the  worst  case— in  practice,  hashing  performs  extremely 
well.  Under  reasonable  assumptions,  the  average  time  to  search  for  an  element  in 
a  hash  table  is  0(1). 

A  hash  table  generalizes  the  simpler  notion  of  an  ordinary  array.  Directly  ad¬ 
dressing  into  an  ordinary  array  makes  effective  use  of  our  ability  to  examine  an 
arbitrary  position  in  an  array  in  0(1)  time.  Section  11.1  discusses  direct  address¬ 
ing  in  more  detail.  We  can  take  advantage  of  direct  addressing  when  we  can  afford 
to  allocate  an  array  that  has  one  position  for  every  possible  key. 

When  the  number  of  keys  actually  stored  is  small  relative  to  the  total  number  of 
possible  keys,  hash  tables  become  an  effective  alternative  to  directly  addressing  an 
array,  since  a  hash  table  typically  uses  an  array  of  size  proportional  to  the  number 
of  keys  actually  stored.  Instead  of  using  the  key  as  an  array  index  directly,  the  array 
index  is  computed  from  the  key.  Section  11.2  presents  the  main  ideas,  focusing  on 
“chaining”  as  a  way  to  handle  “collisions,”  in  which  more  than  one  key  maps  to  the 
same  array  index.  Section  11.3  describes  how  we  can  compute  array  indices  from 
keys  using  hash  functions.  We  present  and  analyze  several  variations  on  the  basic 
theme.  Section  11.4  looks  at  “open  addressing,”  which  is  another  way  to  deal  with 
collisions.  The  bottom  line  is  that  hashing  is  an  extremely  effective  and  practical 
technique:  the  basic  dictionary  operations  require  only  0(1)  time  on  the  average. 
Section  11.5  explains  how  “perfect  hashing”  can  support  searches  in  0(1)  worst- 
case  time,  when  the  set  of  keys  being  stored  is  static  (that  is,  when  the  set  of  keys 
never  changes  once  stored). 
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11.1  Direct-address  tables 

Direct  addressing  is  a  simple  technique  that  works  well  when  the  universe  U  of 
keys  is  reasonably  small.  Suppose  that  an  application  needs  a  dynamic  set  in  which 
each  element  has  a  key  drawn  from  the  universe  U  =  {0, 1, ....  m  —  1},  where  m 
is  not  too  large.  We  shall  assume  that  no  two  elements  have  the  same  key. 

To  represent  the  dynamic  set,  we  use  an  array,  or  direct-address  table,  denoted 
by  T[0 . .  m  —  1],  in  which  each  position,  or  slot,  corresponds  to  a  key  in  the  uni¬ 
verse  U.  Figure  11.1  illustrates  the  approach;  slot  k  points  to  an  element  in  the  set 
with  key  k.  If  the  set  contains  no  element  with  key  k,  then  T[k]  =  nil. 

The  dictionary  operations  are  trivial  to  implement: 

Direct-Address-Search  (T,  k) 

1  return  T  [&] 

Direct- Address-Insert  (T,  x) 

1  T[x.key\  =  x 

Direct-  Address-Delete  (T,  *) 

1  T[x.key\  —  NIL 

Each  of  these  operations  takes  only  0(1)  time. 


T 


Figure  11.1  How  to  implement  a  dynamic  set  by  a  direct  address  table  T .  Each  key  in  the  universe 
U  =  {0. 1 , . . . ,  9}  corresponds  to  an  index  in  the  table.  The  set  K  =  {2,3,5, 8}  of  actual  keys 
determines  the  slots  in  the  table  that  contain  pointers  to  elements.  The  other  slots,  heavily  shaded, 
contain  NIL. 
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For  some  applications,  the  direct-address  table  itself  can  hold  the  elements  in  the 
dynamic  set.  That  is,  rather  than  storing  an  element’s  key  and  satellite  data  in  an 
object  external  to  the  direct-address  table,  with  a  pointer  from  a  slot  in  the  table  to 
the  object,  we  can  store  the  object  in  the  slot  itself,  thus  saving  space.  We  would 
use  a  special  key  within  an  object  to  indicate  an  empty  slot.  Moreover,  it  is  often 
unnecessary  to  store  the  key  of  the  object,  since  if  we  have  the  index  of  an  object 
in  the  table,  we  have  its  key.  If  keys  are  not  stored,  however,  we  must  have  some 
way  to  tell  whether  the  slot  is  empty. 

Exercises 


11.1-1 

Suppose  that  a  dynamic  set  S  is  represented  by  a  direct-address  table  T  of  length  m. 
Describe  a  procedure  that  finds  the  maximum  element  of  S.  What  is  the  worst-case 
performance  of  your  procedure? 


11.1-2 

A  bit  vector  is  simply  an  array  of  bits  (Os  and  Is).  A  bit  vector  of  length  m  takes 
much  less  space  than  an  array  of  m  pointers.  Describe  how  to  use  a  bit  vector 
to  represent  a  dynamic  set  of  distinct  elements  with  no  satellite  data.  Dictionary 
operations  should  run  in  0(1)  time. 


11.1- 3 

Suggest  how  to  implement  a  direct-address  table  in  which  the  keys  of  stored  el¬ 
ements  do  not  need  to  be  distinct  and  the  elements  can  have  satellite  data.  All 
three  dictionary  operations  (Insert,  Delete,  and  Search)  should  run  in  0(1) 
time.  (Don’t  forget  that  Delete  takes  as  an  argument  a  pointer  to  an  object  to  be 
deleted,  not  a  key.) 

11.1- 4  * 

We  wish  to  implement  a  dictionary  by  using  direct  addressing  on  a  huge  array.  At 
the  start,  the  array  entries  may  contain  garbage,  and  initializing  the  entire  array 
is  impractical  because  of  its  size.  Describe  a  scheme  for  implementing  a  direct- 
address  dictionary  on  a  huge  array.  Each  stored  object  should  use  0(1)  space; 
the  operations  Search,  Insert,  and  Delete  should  take  0(1)  time  each;  and 
initializing  the  data  structure  should  take  0(1)  time.  {Hint:  Use  an  additional  array, 
treated  somewhat  like  a  stack  whose  size  is  the  number  of  keys  actually  stored  in 
the  dictionary,  to  help  determine  whether  a  given  entry  in  the  huge  array  is  valid  or 
not.) 
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11.2  Hash  tables 

The  downside  of  direct  addressing  is  obvious:  if  the  universe  U  is  large,  storing 
a  table  T  of  size  |  U  |  may  be  impractical,  or  even  impossible,  given  the  memory 
available  on  a  typical  computer.  Furthermore,  the  set  K  of  keys  actually  stored 
may  be  so  small  relative  to  U  that  most  of  the  space  allocated  for  T  would  be 
wasted. 

When  the  set  K  of  keys  stored  in  a  dictionary  is  much  smaller  than  the  uni¬ 
verse  U  of  all  possible  keys,  a  hash  table  requires  much  less  storage  than  a  direct- 
address  table.  Specifically,  we  can  reduce  the  storage  requirement  to  ©  ( |  A'  | )  while 
we  maintain  the  benefit  that  searching  for  an  element  in  the  hash  table  still  requires 
only  0(1)  time.  The  catch  is  that  this  bound  is  for  the  average-case  time,  whereas 
for  direct  addressing  it  holds  for  the  worst-case  time. 

With  direct  addressing,  an  element  with  key  k  is  stored  in  slot  k.  With  hashing, 
this  element  is  stored  in  slot  h(k)  \  that  is,  we  use  a  hash  function  h  to  compute  the 
slot  from  the  key  k.  Here,  h  maps  the  universe  U  of  keys  into  the  slots  of  a  hash 
table  T[0 . .  m  —  1]: 

h  :  U  —*■  {0, 1, ...  ,m  —  1}  , 

where  the  size  m  of  the  hash  table  is  typically  much  less  than  |C/|.  We  say  that  an 
element  with  key  k  hashes  to  slot  h(k)\  we  also  say  that  h(k)  is  the  hash  value  of 
key  k.  Figure  11.2  illustrates  the  basic  idea.  The  hash  function  reduces  the  range 
of  array  indices  and  hence  the  size  of  the  array.  Instead  of  a  size  of  |t/|,  the  array 
can  have  size  m. 


Figure  11.2  Using  a  hash  function  h  to  map  keys  to  hash  table  slots.  Because  keys  Ic2  and  k 5  map 
to  the  same  slot,  they  collide. 
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Figure  11J  Collision  resolution  by  chaining.  Each  hash  table  slot  T[j]  contains  a  linked  list  of 
all  the  keys  whose  hash  value  is  j .  For  example,  h(k\)  =  h(k^)  and  h(k$)  =  h(k-j)  =  hlkj). 
The  linked  list  can  be  either  singly  or  doubly  linked;  we  show  it  as  doubly  linked  because  deletion  is 
faster  that  way. 

There  is  one  hitch:  two  keys  may  hash  to  the  same  slot.  We  call  this  situation 
a  collision.  Fortunately,  we  have  effective  techniques  for  resolving  the  conflict 
created  by  collisions. 

Of  course,  the  ideal  solution  would  be  to  avoid  collisions  altogether.  We  might 
try  to  achieve  this  goal  by  choosing  a  suitable  hash  function  h.  One  idea  is  to 
make  h  appear  to  be  “random,”  thus  avoiding  collisions  or  at  least  minimizing 
their  number.  The  very  term  “to  hash,”  evoking  images  of  random  mixing  and 
chopping,  captures  the  spirit  of  this  approach.  (Of  course,  a  hash  function  h  must  be 
deterministic  in  that  a  given  input  k  should  always  produce  the  same  output  h(k).) 
Because  \U\  >  m,  however,  there  must  be  at  least  two  keys  that  have  the  same  hash 
value;  avoiding  collisions  altogether  is  therefore  impossible.  Thus,  while  a  well- 
designed,  “random”-looking  hash  function  can  minimize  the  number  of  collisions, 
we  still  need  a  method  for  resolving  the  collisions  that  do  occur. 

The  remainder  of  this  section  presents  the  simplest  collision  resolution  tech¬ 
nique,  called  chaining.  Section  1 1.4  introduces  an  alternative  method  for  resolving 
collisions,  called  open  addressing. 

Collision  resolution  by  chaining 

In  chaining,  we  place  all  the  elements  that  hash  to  the  same  slot  into  the  same 
linked  list,  as  Figure  11.3  shows.  Slot  j  contains  a  pointer  to  the  head  of  the  list  of 
all  stored  elements  that  hash  to  j;  if  there  are  no  such  elements,  slot  j  contains  nil. 
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The  dictionary  operations  on  a  hash  table  T  are  easy  to  implement  when  colli¬ 
sions  are  resolved  by  chaining: 

Chained-Hash-Insert(T,  x) 

1  insert  x  at  the  head  of  list  T  [h  (x .  key)] 

Chained-Hash-Search  (T,  k) 

1  search  for  an  element  with  key  k  in  list  T[h(k)\ 

Chained-Hash-Delete(T,  x) 

1  delete  x  from  the  list  T  [h  (x .  key)] 

The  worst-case  running  time  for  insertion  is  0(1).  The  insertion  procedure  is  fast 
in  part  because  it  assumes  that  the  element  x  being  inserted  is  not  already  present  in 
the  table;  if  necessary,  we  can  check  this  assumption  (at  additional  cost)  by  search¬ 
ing  for  an  element  whose  key  is  x.key  before  we  insert.  For  searching,  the  worst- 
case  running  time  is  proportional  to  the  length  of  the  list;  we  shall  analyze  this 
operation  more  closely  below.  We  can  delete  an  element  in  0(1)  time  if  the  lists 
are  doubly  linked,  as  Figure  11.3  depicts.  (Note  that  Chained-Hash-Delete 
takes  as  input  an  element  x  and  not  its  key  k,  so  that  we  don’t  have  to  search  for  x 
first.  If  the  hash  table  supports  deletion,  then  its  linked  lists  should  be  doubly  linked 
so  that  we  can  delete  an  item  quickly.  If  the  lists  were  only  singly  linked,  then  to 
delete  element  x,  we  would  first  have  to  find  x  in  the  list  T [It (x.key)}  so  that  we 
could  update  the  next  attribute  of  x’s  predecessor.  With  singly  linked  lists,  both 
deletion  and  searching  would  have  the  same  asymptotic  running  times.) 

Analysis  of  hashing  with  chaining 

How  well  does  hashing  with  chaining  perform?  In  particular,  how  long  does  it  take 
to  search  for  an  element  with  a  given  key? 

Given  a  hash  table  T  with  m  slots  that  stores  n  elements,  we  define  the  load 
factor  a  for  T  as  n/m,  that  is,  the  average  number  of  elements  stored  in  a  chain. 
Our  analysis  will  be  in  terms  of  a,  which  can  be  less  than,  equal  to,  or  greater 
than  1. 

The  worst-case  behavior  of  hashing  with  chaining  is  terrible:  all  n  keys  hash 
to  the  same  slot,  creating  a  list  of  length  n .  The  worst-case  time  for  searching  is 
thus  &(n)  plus  the  time  to  compute  the  hash  function— no  better  than  if  we  used 
one  linked  list  for  all  the  elements.  Clearly,  we  do  not  use  hash  tables  for  their 
worst-case  performance.  (Perfect  hashing,  described  in  Section  11.5,  does  provide 
good  worst-case  performance  when  the  set  of  keys  is  static,  however.) 

The  average-case  performance  of  hashing  depends  on  how  well  the  hash  func¬ 
tion  It  distributes  the  set  of  keys  to  be  stored  among  the  m  slots,  on  the  average. 
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Section  11.3  discusses  these  issues,  but  for  now  we  shall  assume  that  any  given 
element  is  equally  likely  to  hash  into  any  of  the  m  slots,  independently  of  where 
any  other  element  has  hashed  to.  We  call  this  the  assumption  of  simple  uniform 
hashing. 

For  j  =  0,  1 . m  —  1,  let  us  denote  the  length  of  the  list  T[j]  by  rij,  so  that 

n  —  /? o  +  n  i  +  •  •  •  +  nm-\  ,  (11.1) 

and  the  expected  value  of  nj  is  E  [/;,]  =  a  =  n/ m. 

We  assume  that  0(1)  time  suffices  to  compute  the  hash  value  h(k),  so  that 
the  time  required  to  search  for  an  element  with  key  k  depends  linearly  on  the 
length  rih(k)  of  the  list  T[h(k)].  Setting  aside  the  0(1)  time  required  to  compute 
the  hash  function  and  to  access  slot  h(k),  let  us  consider  the  expected  number  of 
elements  examined  by  the  search  algorithm,  that  is,  the  number  of  elements  in  the 
list  T[h(k)]  that  the  algorithm  checks  to  see  whether  any  have  a  key  equal  to  k.  We 
shall  consider  two  cases.  In  the  first,  the  search  is  unsuccessful:  no  element  in  the 
table  has  key  k.  In  the  second,  the  search  successfully  finds  an  element  with  key  k. 

Theorem  11.1 

In  a  hash  table  in  which  collisions  are  resolved  by  chaining,  an  unsuccessful  search 
takes  average-case  time  0(1 +a),  under  the  assumption  of  simple  uniform  hashing. 


Proof  Under  the  assumption  of  simple  uniform  hashing,  any  key  k  not  already 
stored  in  the  table  is  equally  likely  to  hash  to  any  of  the  m  slots.  The  expected  time 
to  search  unsuccessfully  for  a  key  k  is  the  expected  time  to  search  to  the  end  of 
list  T[h(k)],  which  has  expected  length  E  [oh(k)\  =  a.  Thus,  the  expected  number 
of  elements  examined  in  an  unsuccessful  search  is  a,  and  the  total  time  required 
(including  the  time  for  computing  h(k))  is  0(1  +  a).  m 

The  situation  for  a  successful  search  is  slightly  different,  since  each  list  is  not 
equally  likely  to  be  searched.  Instead,  the  probability  that  a  list  is  searched  is  pro¬ 
portional  to  the  number  of  elements  it  contains.  Nonetheless,  the  expected  search 
time  still  turns  out  to  be  0(1  +  a). 

Theorem  11.2 

In  a  hash  table  in  which  collisions  are  resolved  by  chaining,  a  successful  search 
takes  average-case  time  0(1 +a),  under  the  assumption  of  simple  uniform  hashing. 


Proof  We  assume  that  the  element  being  searched  for  is  equally  likely  to  be  any 
of  the  n  elements  stored  in  the  table.  The  number  of  elements  examined  during  a 
successful  search  for  an  element  x  is  one  more  than  the  number  of  elements  that 
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appeal-  before  x  in  x’s  list.  Because  new  elements  are  placed  at  the  front  of  the 
list,  elements  before  x  in  the  list  were  all  inserted  after  x  was  inserted.  To  find 
the  expected  number  of  elements  examined,  we  take  the  average,  over  the  n  ele¬ 
ments  x  in  the  table,  of  1  plus  the  expected  number  of  elements  added  to  x’s  list 
after  x  was  added  to  the  list.  Let  x,  denote  the  zth  element  inserted  into  the  ta¬ 
ble,  for  i  =  1,2 and  let  kj  =  Xj.key.  For  keys  /c,  and  kj,  we  define  the 
indicator  random  variable  X,7-  =  I  \h(k/)  =  h(kj)}.  Under  the  assumption  of  sim¬ 
ple  uniform  hashing,  we  have  Pr  {h(kj)  =  h(kj)}  =  1/m,  and  so  by  Lemma  5.1, 
E  [Xjj]  =  1/m.  Thus,  the  expected  number  of  elements  examined  in  a  successful 
search  is 


E 


7  =  1 
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(by  linearity  of  expectation) 


(by  equation  (A.l)) 


Thus,  the  total  time  required  for  a  successful  search  (including  the  time  for  com¬ 
puting  the  hash  function)  is  0(2  +  a/2  —  a/2n)  =  0(1  +  a).  m 


What  does  this  analysis  mean?  If  the  number  of  hash-table  slots  is  at  least  pro¬ 
portional  to  the  number  of  elements  in  the  table,  we  have  n  =  0(m)  and,  con¬ 
sequently,  a  —  n/m  =  0(m)/m  =  0(1).  Thus,  searching  takes  constant  time 
on  average.  Since  insertion  takes  0(1)  worst-case  time  and  deletion  takes  0(1) 
worst-case  time  when  the  lists  are  doubly  linked,  we  can  support  all  dictionary 
operations  in  0(1)  time  on  average. 
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Exercises 


11.2-1 

Suppose  we  use  a  hash  function  h  to  hash  n  distinct  keys  into  an  array  T  of 
length  m.  Assuming  simple  uniform  hashing,  what  is  the  expected  number  of 
collisions?  More  precisely,  what  is  the  expected  cardinality  of  {{k,  1}  :  k  ^  /  and 
h(k)  =  /?(/)}? 


11.2-2 

Demonstrate  what  happens  when  we  insert  the  keys  5, 28, 19, 15, 20,  33,  12,  17,  10 
into  a  hash  table  with  collisions  resolved  by  chaining.  Let  the  table  have  9  slots, 
and  let  the  hash  function  be  h(k)  =  k  mod  9. 


11.2-3 

Professor  Marley  hypothesizes  that  he  can  obtain  substantial  performance  gains  by 
modifying  the  chaining  scheme  to  keep  each  list  in  sorted  order.  How  does  the  pro¬ 
fessor’s  modification  affect  the  running  time  for  successful  searches,  unsuccessful 
searches,  insertions,  and  deletions? 


11.2-4 

Suggest  how  to  allocate  and  deallocate  storage  for  elements  within  the  hash  table 
itself  by  linking  all  unused  slots  into  a  free  list.  Assume  that  one  slot  can  store 
a  flag  and  either  one  element  plus  a  pointer  or  two  pointers.  All  dictionary  and 
free-list  operations  should  run  in  0(1)  expected  time.  Does  the  free  list  need  to  be 
doubly  linked,  or  does  a  singly  linked  free  list  suffice? 


11.2-5 

Suppose  that  we  are  storing  a  set  of  n  keys  into  a  hash  table  of  size  m.  Show  that  if 
the  keys  are  drawn  from  a  universe  U  with  |  U  |  >  nm,  then  U  has  a  subset  of  size  n 
consisting  of  keys  that  all  hash  to  the  same  slot,  so  that  the  worst-case  searching 
time  for  hashing  with  chaining  is  ©(«). 


11.2-6 

Suppose  we  have  stored  n  keys  in  a  hash  table  of  size  m,  with  collisions  resolved  by 
chaining,  and  that  we  know  the  length  of  each  chain,  including  the  length  L  of  the 
longest  chain.  Describe  a  procedure  that  selects  a  key  uniformly  at  random  from 
among  the  keys  in  the  hash  table  and  returns  it  in  expected  time  0(L  ■  (1  +  1  /a)). 
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11.3  Hash  functions 

In  this  section,  we  discuss  some  issues  regarding  the  design  of  good  hash  functions 
and  then  present  three  schemes  for  their  creation.  Two  of  the  schemes,  hashing  by 
division  and  hashing  by  multiplication,  are  heuristic  in  nature,  whereas  the  third 
scheme,  universal  hashing,  uses  randomization  to  provide  provably  good  perfor¬ 
mance. 

What  makes  a  good  hash  function? 

A  good  hash  function  satisfies  (approximately)  the  assumption  of  simple  uniform 
hashing:  each  key  is  equally  likely  to  hash  to  any  of  the  m  slots,  independently  of 
where  any  other  key  has  hashed  to.  Unfortunately,  we  typically  have  no  way  to 
check  this  condition,  since  we  rarely  know  the  probability  distribution  from  which 
the  keys  are  drawn.  Moreover,  the  keys  might  not  be  drawn  independently. 

Occasionally  we  do  know  the  distribution.  For  example,  if  we  know  that  the 
keys  are  random  real  numbers  k  independently  and  uniformly  distributed  in  the 
range  0  <  k  <  1 ,  then  the  hash  function 

h(k)  =  \km\ 

satisfies  the  condition  of  simple  uniform  hashing. 

In  practice,  we  can  often  employ  heuristic  techniques  to  create  a  hash  function 
that  performs  well.  Qualitative  information  about  the  distribution  of  keys  may  be 
useful  in  this  design  process.  For  example,  consider  a  compiler’s  symbol  table,  in 
which  the  keys  are  character  strings  representing  identifiers  in  a  program.  Closely 
related  symbols,  such  as  pt  and  pts,  often  occur  in  the  same  program.  A  good 
hash  function  would  minimize  the  chance  that  such  valiants  hash  to  the  same  slot. 

A  good  approach  derives  the  hash  value  in  a  way  that  we  expect  to  be  indepen¬ 
dent  of  any  patterns  that  might  exist  in  the  data.  For  example,  the  “division  method” 
(discussed  in  Section  11.3.1)  computes  the  hash  value  as  the  remainder  when  the 
key  is  divided  by  a  specified  prime  number.  This  method  frequently  gives  good 
results,  assuming  that  we  choose  a  prime  number  that  is  unrelated  to  any  patterns 
in  the  distribution  of  keys. 

Finally,  we  note  that  some  applications  of  hash  functions  might  require  stronger 
properties  than  are  provided  by  simple  uniform  hashing.  For  example,  we  might 
want  keys  that  are  “close”  in  some  sense  to  yield  hash  values  that  are  far  apart. 
(This  property  is  especially  desirable  when  we  are  using  linear  probing,  defined  in 
Section  11.4.)  Universal  hashing,  described  in  Section  11.3.3,  often  provides  the 
desired  properties. 
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Interpreting  keys  as  natural  numbers 

Most  hash  functions  assume  that  the  universe  of  keys  is  the  set  N  =  {0,  1, 2, . . .} 
of  natural  numbers.  Thus,  if  the  keys  are  not  natural  numbers,  we  find  a  way  to 
interpret  them  as  natural  numbers.  For  example,  we  can  interpret  a  character  string 
as  an  integer  expressed  in  suitable  radix  notation.  Thus,  we  might  interpret  the 
identifier  pt  as  the  pair  of  decimal  integers  (112,  116),  since  p  =  1 12  and  t  =  1 16 
in  the  ASCII  character  set;  then,  expressed  as  a  radix- 128  integer,  pt  becomes 
(112  •  128)  +  116  =  14452.  In  the  context  of  a  given  application,  we  can  usually 
devise  some  such  method  for  interpreting  each  key  as  a  (possibly  large)  natural 
number.  In  what  follows,  we  assume  that  the  keys  are  natural  numbers. 

11.3.1  The  division  method 

In  the  division  method  for  creating  hash  functions,  we  map  a  key  k  into  one  of  m 
slots  by  taking  the  remainder  of  k  divided  by  m.  That  is,  the  hash  function  is 

li(k)  =  k  mod  m  . 

For  example,  if  the  hash  table  has  size  m  =  12  and  the  key  is  k  =  100,  then 
h(k )  =  4.  Since  it  requires  only  a  single  division  operation,  hashing  by  division  is 
quite  fast. 

When  using  the  division  method,  we  usually  avoid  certain  values  of  m.  For 
example,  m  should  not  be  a  power  of  2,  since  if  m  =  2P ,  then  h(k)  is  just  the  p 
lowest-order  bits  of  k.  Unless  we  know  that  all  low-order  p- bit  patterns  are  equally 
likely,  we  are  better  off  designing  the  hash  function  to  depend  on  all  the  bits  of  the 
key.  As  Exercise  11.3-3  asks  you  to  show,  choosing  m  =  2P  —  \  when  A;  is  a 
character  string  interpreted  in  radix  2P  may  be  a  poor  choice,  because  permuting 
the  characters  of  k  does  not  change  its  hash  value. 

A  prime  not  too  close  to  an  exact  power  of  2  is  often  a  good  choice  for  m.  For 
example,  suppose  we  wish  to  allocate  a  hash  table,  with  collisions  resolved  by 
chaining,  to  hold  roughly  n  =  2000  character  strings,  where  a  character  has  8  bits. 
We  don’t  mind  examining  an  average  of  3  elements  in  an  unsuccessful  search,  and 
so  we  allocate  a  hash  table  of  size  m  =  701.  We  could  choose  m  =  701  because 
it  is  a  prime  near  2000/3  but  not  near  any  power  of  2.  Treating  each  key  k  as  an 
integer,  our  hash  function  would  be 

h{k)  —  k  mod  701  . 

11.3.2  The  multiplication  method 

The  multiplication  method  for  creating  hash  functions  operates  in  two  steps.  First, 
we  multiply  the  key  A:  by  a  constant  A  in  the  range  0  <  A  <  1  and  extract  the 
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w  bits 
k 


X 

s  =  A  •  2W 

r\ 

r0 

extract  p  bits 


h(k) 


Figure  11.4  The  multiplication  method  of  hashing.  The  w  bit  representation  of  the  key  k  is  multi 
plied  by  the  w  bit  value  s  =  A  •  2W .  The  p  highest  order  bits  of  the  lower  w  bit  half  of  the  product 
form  the  desired  hash  value  h(k). 

fractional  part  of  kA.  Then,  we  multiply  this  value  by  m  and  take  the  floor  of  the 
result.  In  short,  the  hash  function  is 

h(k)  =  [ m  (kA  mod  1 ) J  , 

where  “kA  mod  1”  means  the  fractional  part  of  kA,  that  is,  k  A  —  \_k  A  J . 

An  advantage  of  the  multiplication  method  is  that  the  value  of  m  is  not  critical. 
We  typically  choose  it  to  be  a  power  of  2  (m  =  2P  for  some  integer  p),  since  we 
can  then  easily  implement  the  function  on  most  computers  as  follows.  Suppose 
that  the  word  size  of  the  machine  is  w  bits  and  that  k  fits  into  a  single  word.  We 
restrict  A  to  be  a  fraction  of  the  form  s/2w,  where  s  is  an  integer  in  the  range 
0  <  s  <  2W .  Referring  to  Figure  11.4,  we  first  multiply  k  by  the  ?/; -bit  integer 
s  =  A  ■  2W .  The  result  is  a  2in-bit  value  rt2w  +  r0 ,  where  ?y  is  the  high-order  word 
of  the  product  and  r0  is  the  low-order  word  of  the  product.  The  desired  p- bit  hash 
value  consists  of  the  p  most  significant  bits  of  r0. 

Although  this  method  works  with  any  value  of  the  constant  A,  it  works  better 
with  some  values  than  with  others.  The  optimal  choice  depends  on  the  character¬ 
istics  of  the  data  being  hashed.  Knuth  [211]  suggests  that 

A  ss  (V5—  l)/2  =  0.6180339887...  (11.2) 

is  likely  to  work  reasonably  well. 

As  an  example,  suppose  we  have  k  =  123456,  p  =  14,  m  =  214  =  16384, 
and  w  =  32.  Adapting  Knuth’s  suggestion,  we  choose  A  to  be  the  fraction  of  the 
form  s/232  that  is  closest  to  (y/5  —  l)/2,  so  that  A  =  2654435769/232.  Then 
k  ■  s  =  3277 06022297 664  =  (76300  ■  232)  +  17612864,  and  so  r,  =  76300 
and  r0  =  17612864.  The  14  most  significant  bits  of  r0  yield  the  value  h(k )  =  67. 
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11.3.3  Universal  hashing 

If  a  malicious  adversary  chooses  the  keys  to  be  hashed  by  some  fixed  hash  function, 
then  the  adversary  can  choose  n  keys  that  all  hash  to  the  same  slot,  yielding  an  av¬ 
erage  retrieval  time  of  &(n).  Any  fixed  hash  function  is  vulnerable  to  such  terrible 
worst-case  behavior;  the  only  effective  way  to  improve  the  situation  is  to  choose 
the  hash  function  randomly  in  a  way  that  is  independent  of  the  keys  that  are  actually 
going  to  be  stored.  This  approach,  called  universal  hashing ,  can  yield  provably 
good  performance  on  average,  no  matter  which  keys  the  adversary  chooses. 

In  universal  hashing,  at  the  beginning  of  execution  we  select  the  hash  function 
at  random  from  a  carefully  designed  class  of  functions.  As  in  the  case  of  quick¬ 
sort,  randomization  guarantees  that  no  single  input  will  always  evoke  worst-case 
behavior.  Because  we  randomly  select  the  hash  function,  the  algorithm  can  be¬ 
have  differently  on  each  execution,  even  for  the  same  input,  guaranteeing  good 
average-case  performance  for  any  input.  Returning  to  the  example  of  a  compiler’s 
symbol  table,  we  find  that  the  programmer’s  choice  of  identifiers  cannot  now  cause 
consistently  poor  hashing  performance.  Poor  performance  occurs  only  when  the 
compiler  chooses  a  random  hash  function  that  causes  the  set  of  identifiers  to  hash 
poorly,  but  the  probability  of  this  situation  occurring  is  small  and  is  the  same  for 
any  set  of  identifiers  of  the  same  size. 

Let  IK  be  a  finite  collection  of  hash  functions  that  map  a  given  universe  U  of 

keys  into  the  range  {0,  1 . m  —  1}.  Such  a  collection  is  said  to  be  universal 

if  for  each  pair  of  distinct  keys  k.l  e  U ,  the  number  of  hash  functions  h  e  Jt 
for  which  h(k)  =  h(l )  is  at  most  \M \  /m.  In  other  words,  with  a  hash  function 
randomly  chosen  from  K,  the  chance  of  a  collision  between  distinct  keys  k  and  / 
is  no  more  than  the  chance  1/m  of  a  collision  if  h(k)  and  /?(/)  were  randomly  and 
independently  chosen  from  the  set  {0,  1, ....  m  —  1}. 

The  following  theorem  shows  that  a  universal  class  of  hash  functions  gives  good 
average-case  behavior.  Recall  that  n,  denotes  the  length  of  list  T[i\. 

Theorem  11.3 

Suppose  that  a  hash  function  h  is  chosen  randomly  from  a  universal  collection  of 
hash  functions  and  has  been  used  to  hash  n  keys  into  a  table  T  of  size  m,  us¬ 
ing  chaining  to  resolve  collisions.  If  key  k  is  not  in  the  table,  then  the  expected 
length  E  [tih(k)\  of  the  list  that  key  k  hashes  to  is  at  most  the  load  factor  a  =  n/m. 
If  key  k  is  in  the  table,  then  the  expected  length  E  [nh(k)\  of  the  list  containing  key  k 
is  at  most  1  +  a. 

Proof  We  note  that  the  expectations  here  are  over  the  choice  of  the  hash  func¬ 
tion  and  do  not  depend  on  any  assumptions  about  the  distribution  of  the  keys. 
For  each  pair  k  and  /  of  distinct  keys,  define  the  indicator  random  variable 
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Xpi  =  I  {/?(/<)  =  /?(/)).  Since  by  the  definition  of  a  universal  collection  of  hash 
functions,  a  single  pair  of  keys  collides  with  probability  at  most  1/m,  we  have 
Pr  {h(k)  =  /?(/)}  <  1/m.  By  Lemma  5.1,  therefore,  we  have  E  [Xki]  <  1/m. 

Next  we  define,  for  each  key  k,  the  random  variable  Yk  that  equals  the  number 
of  keys  other  than  k  that  hash  to  the  same  slot  as  k ,  so  that 

Yk  =  J2  Xkl  • 

l€T 

l^k 


Thus  we  have 


E  [Yk\  = 


< 


E 


Yl,Xki 

1st 

l^k 


1st 

l^k 


lj=k 


(by  linearity  of  expectation) 


The  remainder  of  the  proof  depends  on  whether  key  k  is  in  table  T . 

*  If  k  $  T,  then  nh(k)  —  Yk  and  \{l  :  l  e  T  and  l  ^  k}\  =  n.  Thus  E  [«/,(*)]  = 
E  [Yk]  <  n/m  —  a. 

*  If  k  €  T ,  then  because  key  k  appears  in  list  T[h(k)\  and  the  count  Yk  does  not 
include  key  k,  we  have  iih(k)  =  Yt  +  1  and  \{l  :  l  e  T  and  I  ^  k}\  =  n  —  1. 
Thus  E  [»/;(£)]  =  E  [Yk]  +  1  <  (n  —  l)/m  +  1  =  1  +  a  —  1/m  <  1  +  a.  m 


The  following  corollary  says  universal  hashing  provides  the  desired  payoff:  it 
has  now  become  impossible  for  an  adversary  to  pick  a  sequence  of  operations  that 
forces  the  worst-case  running  time.  By  cleverly  randomizing  the  choice  of  hash 
function  at  run  time,  we  guarantee  that  we  can  process  every  sequence  of  operations 
with  a  good  average-case  running  time. 

Corollary  11.4 

Using  universal  hashing  and  collision  resolution  by  chaining  in  an  initially  empty 
table  with  m  slots,  it  takes  expected  time  ©(/?)  to  handle  any  sequence  of  n  Insert, 
Search,  and  Delete  operations  containing  0(m)  Insert  operations. 

Proof  Since  the  number  of  insertions  is  0(m),  we  have  n  —  0{m)  and  so 
a  =  0(1).  The  Insert  and  Delete  operations  take  constant  time  and,  by  The¬ 
orem  11.3,  the  expected  time  for  each  Search  operation  is  0(1).  By  linearity  of 
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expectation,  therefore,  the  expected  time  for  the  entire  sequence  of  n  operations 
is  O(n).  Since  each  operation  takes  £2(1)  time,  the  Q(n)  bound  follows.  ■ 

Designing  a  universal  class  of  hash  functions 

It  is  quite  easy  to  design  a  universal  class  of  hash  functions,  as  a  little  number 
theory  will  help  us  prove.  You  may  wish  to  consult  Chapter  31  first  if  you  are 
unfamiliar  with  number  theory. 

We  begin  by  choosing  a  prime  number  p  large  enough  so  that  every  possible 
key  k  is  in  the  range  0  to  p  —  1,  inclusive.  Let  T,p  denote  the  set  {0, 1, ....  p  —  1}, 
and  let  Z*  denote  the  set  {1, 2,  ...,/>  —  1}.  Since  p  is  prime,  we  can  solve  equa¬ 
tions  modulo  p  with  the  methods  given  in  Chapter  31.  Because  we  assume  that  the 
size  of  the  universe  of  keys  is  greater  than  the  number  of  slots  in  the  hash  table,  we 
have  p  >  m. 

We  now  define  the  hash  function  hab  for  any  a  e  Z*  and  any  be  7LP  using  a 
linear  transformation  followed  by  reductions  modulo  p  and  then  modulo  m : 

hab(k )  =  (( ak  +  b)  mod  p)  mod  m  .  (11.3) 

For  example,  with  p  =  17  and  m  =  6,  we  have  ^3,4(8)  =  5.  The  family  of  all 
such  hash  functions  is 

MPm  =  {hab  :  a  e  Z*  and  b  e  Zp}  .  (11.4) 

Each  hash  function  hab  maps  Z/;  to  Zm .  This  class  of  hash  functions  has  the  nice 
property  that  the  size  m  of  the  output  range  is  arbitrary— not  necessarily  prime— a 
feature  which  we  shall  use  in  Section  11.5.  Since  we  have  p  —  1  choices  for  a 
and  p  choices  for  b,  the  collection  Mpm  contains  p(p  —  1)  hash  functions. 

Theorem  11.5 

The  class  Mpm  of  hash  functions  defined  by  equations  (11.3)  and  (1 1.4)  is  universal. 

Proof  Consider  two  distinct  keys  k  and  l  from  7LP,  so  that  kf  l.  For  a  given 
hash  function  hab  we  let 

r  =  (ak  +  b)  mod  p  , 
s  =  (al  +  b)  mod  p  . 

We  first  note  that  r  f  s.  Why?  Observe  that 
r  —  s  =  a(k  —  l)  (mod  p)  . 

It  follows  that  r  /  s  because  p  is  prime  and  both  a  and  (k  —  l)  are  nonzero 
modulo  p,  and  so  their  product  must  also  be  nonzero  modulo  p  by  Theorem  31.6. 
Therefore,  when  computing  any  hab  e  Mpm ,  distinct  inputs  k  and  /  map  to  distinct 
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values  r  and  s  modulo  p\  there  are  no  collisions  yet  at  the  “mod  p  level.”  Moreover, 
each  of  the  possible  p(p—l)  choices  for  the  pair  (a,  b)  with  a  ^  0  yields  a  different 
resulting  pair  (r,  s )  with  r  ^  s,  since  we  can  solve  for  a  and  b  given  r  and  s : 

a  =  ((r  —  s)((k  —  /)_1  mod  p))  mod  p  , 

b  =  (r  —  ak)  mod  p  , 

where  ((k  —  l )“'  mod  p)  denotes  the  unique  multiplicative  inverse,  modulo  p, 
of  k  —  l.  Since  there  are  only  p(p  —  1)  possible  pairs  (r,  s)  with  r  ^  s,  there 
is  a  one-to-one  correspondence  between  pairs  (a.  b)  with  a  ^  0  and  pairs  (r,  s) 
with  r  s.  Thus,  for  any  given  pair  of  inputs  k  and  /,  if  we  pick  (a,  b)  uniformly 
at  random  from  Z*  x  Zp,  the  resulting  pair  (r,  s )  is  equally  likely  to  be  any  pair  of 
distinct  values  modulo  p. 

Therefore,  the  probability  that  distinct  keys  k  and  /  collide  is  equal  to  the  prob¬ 
ability  that  r  =  s  (mod  m)  when  r  and  s  are  randomly  chosen  as  distinct  values 
modulo  p.  For  a  given  value  of  r,  of  the  p  —  1  possible  remaining  values  for  s,  the 
number  of  values  s  such  that  s/r  and  s  =  r  (mod  m)  is  at  most 

|" p/m  \  —  1  <  ((p  +  m  —  1  )/m)  —  1  (by  inequality  (3.6)) 

=  (P~  l)/w  ■ 

The  probability  that  s  collides  with  r  when  reduced  modulo  m  is  at  most 
({p  -  \)/m)/{p  -  1)  =  1/m. 

Therefore,  for  any  pair  of  distinct  values  kje  Zpi 

Pr  {hab(k)  =  hab(l)}  <  1/m  , 

so  that  -Rpm  is  indeed  universal.  ■ 


Exercises 


11.3-1 

Suppose  we  wish  to  search  a  linked  list  of  length  n,  where  each  element  contains 
a  key  k  along  with  a  hash  value  h(k).  Each  key  is  a  long  character  string.  How 
might  we  take  advantage  of  the  hash  values  when  searching  the  list  for  an  element 
with  a  given  key? 


11.3-2 

Suppose  that  we  hash  a  string  of  r  characters  into  m  slots  by  treating  it  as  a 
radix- 128  number  and  then  using  the  division  method.  We  can  easily  represent 
the  number  m  as  a  32-bit  computer  word,  but  the  string  of  r  characters,  treated  as 
a  radix- 128  number,  takes  many  words.  How  can  we  apply  the  division  method  to 
compute  the  hash  value  of  the  character  string  without  using  more  than  a  constant 
number  of  words  of  storage  outside  the  string  itself? 
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11.3-3 

Consider  a  version  of  the  division  method  in  which  h(k)  =  k  mod  m,  where 
m  =  2P  —  1  and  k  is  a  character  string  interpreted  in  radix  2P .  Show  that  if  we 
can  derive  string  x  from  string  y  by  permuting  its  characters,  then  x  and  y  hash  to 
the  same  value.  Give  an  example  of  an  application  in  which  this  property  would  be 
undesirable  in  a  hash  function. 


11.3- 4 

Consider  a  hash  table  of  size  m  =  1000  and  a  corresponding  hash  function  h(k  )  = 
[m  ( kA  mod  1)J  for  A  =  (V5  —  l)/2.  Compute  the  locations  to  which  the  keys 
61,  62,  63,  64,  and  65  are  mapped. 

11.3- 5  * 

Define  a  family  31  of  hash  functions  from  a  finite  set  U  to  a  finite  set  B  to  be 
e- universal  if  for  all  pairs  of  distinct  elements  k  and  /  in  U, 

Pr  {h(k)  =  /*(/)}  <  e  , 

where  the  probability  is  over  the  choice  of  the  hash  function  h  drawn  at  random 
from  the  family  M.  Show  that  an  e -universal  family  of  hash  functions  must  have 

1  1 

e-JF\~W\' 


11.3-6  * 

Let  U  be  the  set  of  n -tuples  of  values  drawn  from  Zp,  and  let  B  =  Zp,  where  p 
is  prime.  Define  the  hash  function  lib  :  U  B  for  b  e  Zp  on  an  input  /? -tuple 
(a0,a i, . . . ,  a„_ i)  from  U  as 

hb((a0,au.  ■  -,an- 1»  =  j  mod  p  . 

and  let  31  =  {hb  :  b  e  Zp}.  Argue  that  3i  is  ((n  —  1)/ p)-universal  according  to 
the  definition  of  c-universal  in  Exercise  11.3-5.  {Hint:  See  Exercise  31.4-4.) 


11.4  Open  addressing 

In  open  addressing ,  all  elements  occupy  the  hash  table  itself.  That  is,  each  table 
entry  contains  either  an  element  of  the  dynamic  set  or  NIL.  When  searching  for 
an  element,  we  systematically  examine  table  slots  until  either  we  find  the  desired 
element  or  we  have  ascertained  that  the  element  is  not  in  the  table.  No  lists  and 
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no  elements  are  stored  outside  the  table,  unlike  in  chaining.  Thus,  in  open  ad¬ 
dressing,  the  hash  table  can  “fill  up”  so  that  no  further  insertions  can  be  made;  one 
consequence  is  that  the  load  factor  a  can  never  exceed  1 . 

Of  course,  we  could  store  the  linked  lists  for  chaining  inside  the  hash  table,  in 
the  otherwise  unused  hash-table  slots  (see  Exercise  11.2-4),  but  the  advantage  of 
open  addressing  is  that  it  avoids  pointers  altogether.  Instead  of  following  pointers, 
we  compute  the  sequence  of  slots  to  be  examined.  The  extra  memory  freed  by  not 
storing  pointers  provides  the  hash  table  with  a  larger  number  of  slots  for  the  same 
amount  of  memory,  potentially  yielding  fewer  collisions  and  faster  retrieval. 

To  perform  insertion  using  open  addressing,  we  successively  examine,  or  probe, 
the  hash  table  until  we  find  an  empty  slot  in  which  to  put  the  key.  Instead  of  being 
fixed  in  the  order  0,  1, . . . ,  m  —  1  (which  requires  &(n)  search  time),  the  sequence 
of  positions  probed  depends  upon  the  key  being  inserted.  To  determine  which  slots 
to  probe,  we  extend  the  hash  function  to  include  the  probe  number  (starting  from  0) 
as  a  second  input.  Thus,  the  hash  function  becomes 

h  :  U  x  {0, 1 , . . . ,  m  —  1 }  — >  {0, 1 , . . . ,  m  —  1 }  . 

With  open  addressing,  we  require  that  for  every  key  k,  the  probe  sequence 
(h(k,0),h(k,  1), . . .  ,h(k,m  —  1)) 

be  a  permutation  of  (0, 1, . . .  ,m  —  1),  so  that  every  hash-table  position  is  eventually 
considered  as  a  slot  for  a  new  key  as  the  table  fills  up.  In  the  following  pseudocode, 
we  assume  that  the  elements  in  the  hash  table  T  are  keys  with  no  satellite  infor¬ 
mation;  the  key  k  is  identical  to  the  element  containing  key  k.  Each  slot  contains 
either  a  key  or  nil  (if  the  slot  is  empty).  The  Hash-Insert  procedure  takes  as 
input  a  hash  table  T  and  a  key  k.  It  either  returns  the  slot  number  where  it  stores 
key  k  or  flags  an  error  because  the  hash  table  is  already  full. 

Hash-Insert(T,  k) 

1  i=0 

2  repeat 

3  j  =  h(k, i) 

4  if  r[y  ]  ==  NIL 

5  T[j]  =  k 

6  return  j 

7  else  i  =  i  +  1 

8  until  i  ==  m 

9  error  “hash  table  overflow” 

The  algorithm  for  searching  for  key  k  probes  the  same  sequence  of  slots  that  the 
insertion  algorithm  examined  when  key  k  was  inserted.  Therefore,  the  search  can 
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terminate  (unsuccessfully)  when  it  finds  an  empty  slot,  since  k  would  have  been 
inserted  there  and  not  later  in  its  probe  sequence.  (This  argument  assumes  that  keys 
are  not  deleted  from  the  hash  table.)  The  procedure  Hash-Search  takes  as  input 
a  hash  table  T  and  a  key  k,  returning  j  if  it  finds  that  slot  j  contains  key  k,  or  NIL 
if  key  k  is  not  present  in  table  T . 

Hash-Search  (T,k) 

1  i=0 

2  repeat 

3  j  =  h(k,i ) 

4  if  T[j]==k 

5  return  / 

6  i=i+l 

7  until  T[j]==  NIL  or  i  -=  m 

8  return  NIL 

Deletion  from  an  open-address  hash  table  is  difficult.  When  we  delete  a  key 
from  slot  i,  we  cannot  simply  mark  that  slot  as  empty  by  storing  NIL  in  it.  If 
we  did,  we  might  be  unable  to  retrieve  any  key  k  during  whose  insertion  we  had 
probed  slot  i  and  found  it  occupied.  We  can  solve  this  problem  by  marking  the 
slot,  storing  in  it  the  special  value  DELETED  instead  of  NIL.  We  would  then  modify 
the  procedure  Hash-Insert  to  treat  such  a  slot  as  if  it  were  empty  so  that  we  can 
insert  a  new  key  there.  We  do  not  need  to  modify  Hash-Search,  since  it  will  pass 
over  DELETED  values  while  searching.  When  we  use  the  special  value  DELETED, 
however,  search  times  no  longer  depend  on  the  load  factor  a,  and  for  this  reason 
chaining  is  more  commonly  selected  as  a  collision  resolution  technique  when  keys 
must  be  deleted. 

In  our  analysis,  we  assume  uniform  hashing :  the  probe  sequence  of  each  key 
is  equally  likely  to  be  any  of  the  m\  permutations  of  (0,  1,  . . . ,  m  —  1).  Uni¬ 
form  hashing  generalizes  the  notion  of  simple  uniform  hashing  defined  earlier  to  a 
hash  function  that  produces  not  just  a  single  number,  but  a  whole  probe  sequence. 
True  uniform  hashing  is  difficult  to  implement,  however,  and  in  practice  suitable 
approximations  (such  as  double  hashing,  defined  below)  are  used. 

We  will  examine  three  commonly  used  techniques  to  compute  the  probe  se¬ 
quences  required  for  open  addressing:  linear  probing,  quadratic  probing,  and  dou¬ 
ble  hashing.  These  techniques  all  guarantee  that  (h(k,  0),  h(k,  1), . . . ,  h(k.  m  —  1)) 
is  a  permutation  of  (0,  1, ....  m  —  1)  for  each  key  k.  None  of  these  techniques  ful¬ 
fills  the  assumption  of  uniform  hashing,  however,  since  none  of  them  is  capable  of 
generating  more  than  m2  different  probe  sequences  (instead  of  the  nil  that  uniform 
hashing  requires).  Double  hashing  has  the  greatest  number  of  probe  sequences  and, 
as  one  might  expect,  seems  to  give  the  best  results. 
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Linear  probing 

Given  an  ordinary  hash  function  h!  \U  — >•  {0,  1 . m  —  1 },  which  we  refer  to  as 

an  auxiliary  hash  function,  the  method  of  linear  probing  uses  the  hash  function 

h(k,  i )  =  (h'(k)  +  z)  mod  m 

for  z  =  0, 1, . . . ,  m  —  1.  Given  key  k,  we  first  probe  T[h'(k)],  i.e.,  the  slot  given 
by  the  auxiliary  hash  function.  We  next  probe  slot  T[h'(k)  +  1],  and  so  on  up  to 
slot  T[m  —  1].  Then  we  wrap  around  to  slots  T[0],  7’[IJ. . . .  until  we  finally  probe 
slot  T[h'(k)  —  1].  Because  the  initial  probe  determines  the  entire  probe  sequence, 
there  are  only  m  distinct  probe  sequences. 

Linear  probing  is  easy  to  implement,  but  it  suffers  from  a  problem  known  as 
primary  clustering.  Long  runs  of  occupied  slots  build  up,  increasing  the  average 
search  time.  Clusters  arise  because  an  empty  slot  preceded  by  i  full  slots  gets  filled 
next  with  probability  (i  +  1  )/m.  Long  runs  of  occupied  slots  tend  to  get  longer, 
and  the  average  search  time  increases. 

Quadratic  probing 

Quadratic  probing  uses  a  hash  function  of  the  form 

h(k,i)  =  ( h'(k )  +  cf  +  c2i2)  mod  m  ,  (11.5) 

where  h'  is  an  auxiliary  hash  function,  C\  and  c2  are  positive  auxiliary  constants, 
and  i  =0,  l, ...  ,m  —  1.  The  initial  position  probed  is  T[h'(k)]\  later  positions 
probed  are  offset  by  amounts  that  depend  in  a  quadratic  manner  on  the  probe  num¬ 
ber  i .  This  method  works  much  better  than  1  i near  probing,  but  to  make  full  use  of 
the  hash  table,  the  values  of  c i,  c2,  and  m  are  constrained.  Problem  11-3  shows 
one  way  to  select  these  parameters.  Also,  if  two  keys  have  the  same  initial  probe 
position,  then  their  probe  sequences  are  the  same,  since  h(k , ,  0)  =  h(k2, 0)  im¬ 
plies  h(ki ,  i)  =  h(k2,  i ).  This  property  leads  to  a  milder  form  of  clustering,  called 
secondary  clustering.  As  in  linear  probing,  the  initial  probe  determines  the  entire 
sequence,  and  so  only  m  distinct  probe  sequences  are  used. 

Double  hashing 

Double  hashing  offers  one  of  the  best  methods  available  for  open  addressing  be¬ 
cause  the  permutations  produced  have  many  of  the  characteristics  of  randomly 
chosen  permutations.  Double  hashing  uses  a  hash  function  of  the  form 

h(k,i )  =  (/? i  (k)  +  ih2(k ))  mod  m  , 

where  both  hi  and  li2  are  auxiliary  hash  functions.  The  initial  probe  goes  to  posi¬ 
tion  T[h  i  (k )];  successive  probe  positions  are  offset  from  previous  positions  by  the 
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Figure  11.5  Insertion  by  double  hashing.  Here  we  have  a  hash  table  of  size  13  with  h\  (k)  = 
k  mod  13  and  h2(k)  =  1  +  (k  mod  11).  Since  14=1  (mod  13)  and  14  =  3  (mod  11),  we  insert 
the  key  14  into  empty  slot  9,  after  examining  slots  1  and  5  and  finding  them  to  be  occupied. 

amount  h2(k),  modulo  m.  Thus,  unlike  the  case  of  linear  or  quadratic  probing,  the 
probe  sequence  here  depends  in  two  ways  upon  the  key  k,  since  the  initial  probe 
position,  the  offset,  or  both,  may  vary.  Figure  11.5  gives  an  example  of  insertion 
by  double  hashing. 

The  value  h2(k)  must  be  relatively  prime  to  the  hash-table  size  m  for  the  entire 
hash  table  to  be  searched.  (See  Exercise  1 1.4-4.)  A  convenient  way  to  ensure  this 
condition  is  to  let  m  be  a  power  of  2  and  to  design  h2  so  that  it  always  produces  an 
odd  number.  Another  way  is  to  let  m  be  prime  and  to  design  h2  so  that  it  always 
returns  a  positive  integer  less  than  m.  For  example,  we  could  choose  m  prime  and 
let 

h  i  (k)  —  k  mod  m  , 

h2(k)  =  1  +  {k  mod  in')  , 

where  m'  is  chosen  to  be  slightly  less  than  m  (say,  m  —  1).  For  example,  if 
k  =  123456,  m  =  701,  and  m'  =  700,  we  have  h\(k)  =  80  and  h2(k)  =  257,  so 
that  we  first  probe  position  80,  and  then  we  examine  every  257th  slot  (modulo  m) 
until  we  find  the  key  or  have  examined  every  slot. 

When  m  is  prime  or  a  power  of  2,  double  hashing  improves  over  linear  or  qua¬ 
dratic  probing  in  that  0(/n2)  probe  sequences  are  used,  rather  than  0(/n),  since 
each  possible  ( hi(k ),  h2(k))  pair  yields  a  distinct  probe  sequence.  As  a  result,  for 
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such  values  of  m ,  the  performance  of  double  hashing  appeal's  to  be  very  close  to 
the  performance  of  the  “ideal”  scheme  of  uniform  hashing. 

Although  values  of  m  other  than  primes  or  powers  of  2  could  in  principle  be 
used  with  double  hashing,  in  practice  it  becomes  more  difficult  to  efficiently  gen¬ 
erate  h2(k)  in  a  way  that  ensures  that  it  is  relatively  prime  to  m,  in  part  because  the 
relative  density  t p(m)/m  of  such  numbers  may  be  small  (see  equation  (31.24)). 

Analysis  of  open-address  hashing 

As  in  our  analysis  of  chaining,  we  express  our  analysis  of  open  addressing  in  terms 
of  the  load  factor  a  =  n/ m  of  the  hash  table.  Of  course,  with  open  addressing,  at 
most  one  element  occupies  each  slot,  and  thus  n  <  m,  which  implies  a  <  1. 

We  assume  that  we  are  using  uniform  hashing.  In  this  idealized  scheme,  the 
probe  sequence  (h(k,  0),  h(k ,  1),  ....  h(k ,  m  —  1))  used  to  insert  or  search  for 
each  key  k  is  equally  likely  to  be  any  permutation  of  (0,  1, . . . ,  m  —  1).  Of  course, 
a  given  key  has  a  unique  fixed  probe  sequence  associated  with  it;  what  we  mean 
here  is  that,  considering  the  probability  distribution  on  the  space  of  keys  and  the 
operation  of  the  hash  function  on  the  keys,  each  possible  probe  sequence  is  equally 
likely. 

We  now  analyze  the  expected  number  of  probes  for  hashing  with  open  address¬ 
ing  under  the  assumption  of  uniform  hashing,  beginning  with  an  analysis  of  the 
number  of  probes  made  in  an  unsuccessful  search. 

Theorem  11.6 

Given  an  open-address  hash  table  with  load  factor  a  =  n/m  <  1 ,  the  expected 
number  of  probes  in  an  unsuccessful  search  is  at  most  1/(1  —  a),  assuming  uniform 
hashing. 

Proof  In  an  unsuccessful  search,  every  probe  but  the  last  accesses  an  occupied 
slot  that  does  not  contain  the  desired  key,  and  the  last  slot  probed  is  empty.  Let  us 
define  the  random  variable  X  to  be  the  number  of  probes  made  in  an  unsuccessful 
search,  and  let  us  also  define  the  event  A,-,  for  i  =  1, 2, . . .,  to  be  the  event  that 
an  /  th  probe  occurs  and  it  is  to  an  occupied  slot.  Then  the  event  {X  >  i  |  is  the 
intersection  of  events  A1  D  A2  n  ■  ■  ■  D  4,_| .  We  will  bound  Pr  {X  >  i }  by  bounding 
Pr  {Ai  fl  A2  n  ■  ■  ■  n  }.  By  Exercise  C.2-5, 

Pr{Ax  n  A2  n  ■■■  n  A^}  =  Pr^}  -Pr{A2  |  Ax)  -Pr{A3  |  Ax  n  A2}  ■  ■  ■ 

Pr  {Ai-!  \  AlnA2n---f)  A;_2}  . 

Since  there  are  n  elements  and  m  slots,  Pr  {A , }  =  n/m.  For  j  >  1,  the  probability 
that  there  is  a  /  th  probe  and  it  is  to  an  occupied  slot,  given  that  the  first  /  —  1 

probes  were  to  occupied  slots,  is  (n  —  /  +  1 )/ (m  —  j  +  1).  This  probability  follows 
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because  we  would  be  finding  one  of  the  remaining  ( n  —  (  j  —  1))  elements  in  one 
of  the  ( m  —  ( j  —  1))  unexamined  slots,  and  by  the  assumption  of  uniform  hashing, 
the  probability  is  the  ratio  of  these  quantities.  Observing  that  n  <  m  implies  that 
(n  —  j)/ (m.  —  j )  <  n/m  for  all  j  such  that  0  <  j  <  m,  we  have  for  all  i  such  that 
1  <  i  <  m. 


Pr{X>i}  = 


< 


n  n  —  1  n  —  2 

m  m  —  1  m  —  2 


77  —  1+2 
77?  —  7+2 


Now,  we  use  equation  (C.25)  to  bound  the  expected  number  of  probes: 

OO 

E[X]  =  J2  Pr{X>i} 

i  =  1 

OO 

< 

/  =  1 
OO 

=  E“' 

(=0 

1 

1  —  a 


This  bound  of  1/(1  —  a)  =  1  +  a  +  a2  +  a3  H - has  an  intuitive  interpretation. 

We  always  make  the  first  probe.  With  probability  approximately  a,  the  first  probe 
finds  an  occupied  slot,  so  that  we  need  to  probe  a  second  time.  With  probability 
approximately  a2,  the  first  two  slots  are  occupied  so  that  we  make  a  third  probe, 
and  so  on. 

If  a  is  a  constant,  Theorem  1 1 .6  predicts  that  an  unsuccessful  search  runs  in  0(1 ) 
time.  For  example,  if  the  hash  table  is  half  full,  the  average  number  of  probes  in  an 
unsuccessful  search  is  at  most  1/(1  —  .5)  =  2.  If  it  is  90  percent  full,  the  average 
number  of  probes  is  at  most  1/(1  —  .9)  =  10. 

Theorem  11.6  gives  us  the  performance  of  the  Hash-Insert  procedure  almost 
immediately. 

Corollary  11.7 

Inserting  an  element  into  an  open-address  hash  table  with  load  factor  a  requires  at 
most  1/(1  —  a)  probes  on  average,  assuming  uniform  hashing. 
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Proof  An  element  is  inserted  only  if  there  is  room  in  the  table,  and  thus  a  <  1. 
Inserting  a  key  requires  an  unsuccessful  search  followed  by  placing  the  key  into  the 
first  empty  slot  found.  Thus,  the  expected  number  of  probes  is  at  most  1/(1  —a).  ■ 

We  have  to  do  a  little  more  work  to  compute  the  expected  number  of  probes  for 
a  successful  search. 


Theorem  11.8 

Given  an  open-address  hash  table  with  load  factor  a  <  1 ,  the  expected  number  of 
probes  in  a  successful  search  is  at  most 


assuming  uniform  hashing  and  assuming  that  each  key  in  the  table  is  equally  likely 
to  be  searched  for. 


Proof  A  search  for  a  key  k  reproduces  the  same  probe  sequence  as  when  the 
element  with  key  k  was  inserted.  By  Corollary  11.7,  if  k  was  the  (i  +  l)st  key 
inserted  into  the  hash  table,  the  expected  number  of  probes  made  in  a  search  for  k 
is  at  most  1/(1  —  i/m)  =  m/(m  —  i).  Averaging  over  all  n  keys  in  the  hash  table 
gives  us  the  expected  number  of  probes  in  a  successful  search: 


1  =  0 


< 


7=0 


(by  inequality  (A.  1 2)) 


If  the  hash  table  is  half  full,  the  expected  number  of  probes  in  a  successful  search 
is  less  than  1.387.  If  the  hash  table  is  90  percent  full,  the  expected  number  of  probes 
is  less  than  2.559. 
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Exercises 


11.4-1 

Consider  inserting  the  keys  10,22,31,4,15,28,17,88,59  into  a  hash  table  of 
length  m  —  11  using  open  addressing  with  the  auxiliary  hash  function  h'(k)  —  k. 
Illustrate  the  result  of  inserting  these  keys  using  linear  probing,  using  quadratic 
probing  with  C\  =  1  and  c2  =  3,  and  using  double  hashing  with  h ,  (k )  =  k  and 
h2(k)  =  1  +  (k  mod  (m  —  1)). 


11.4-2 

Write  pseudocode  for  Hash-Delete  as  outlined  in  the  text,  and  modify  Hash- 
Insert  to  handle  the  special  value  DELETED. 


11.4-3 

Consider  an  open-address  hash  table  with  uniform  hashing.  Give  upper  bounds 
on  the  expected  number  of  probes  in  an  unsuccessful  search  and  on  the  expected 
number  of  probes  in  a  successful  search  when  the  load  factor  is  3/4  and  when  it 
is  7/8. 


11.4- 4  * 

Suppose  that  we  use  double  hashing  to  resolve  collisions— that  is,  we  use  the  hash 
function  h(k.i)  =  (h  ,  (k)  +  ih2(k ))  mod  m.  Show  that  if  m  and  h2(k)  have 
greatest  common  divisor  d  >  I  for  some  key  k,  then  an  unsuccessful  search  for 
key  k  examines  (\  /  d  )th  of  the  hash  table  before  returning  to  slot  h  \  (k).  Thus, 
when  d  =  1,  so  that  m  and  h2(k)  are  relatively  prime,  the  search  may  examine  the 
entire  hash  table.  (Hint:  See  Chapter  31.) 

11.4- 5  * 

Consider  an  open-address  hash  table  with  a  load  factor  a.  Find  the  nonzero  value  a 
for  which  the  expected  number  of  probes  in  an  unsuccessful  search  equals  twice 
the  expected  number  of  probes  in  a  successful  search.  Use  the  upper  bounds  given 
by  Theorems  1 1.6  and  1 1.8  for  these  expected  numbers  of  probes. 


★  11.5  Perfect  hashing 

Although  hashing  is  often  a  good  choice  for  its  excellent  average-case  perfor¬ 
mance,  hashing  can  also  provide  excellent  worst-case  performance  when  the  set  of 
keys  is  static,  once  the  keys  are  stored  in  the  table,  the  set  of  keys  never  changes. 
Some  applications  naturally  have  static  sets  of  keys:  consider  the  set  of  reserved 
words  in  a  programming  language,  or  the  set  of  file  names  on  a  CD-ROM.  We 
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Figure  11.6  Using  perfect  hashing  to  store  the  set  K  =  {10,22,37,40,52,60,70,72,75}.  The 
outer  hash  function  is  h(k)  =  ((ak  +  b)  mod  p)  mod  m,  where  a  =  3,  b  =  42,  p  =  101,  and 
m  =  9.  For  example,  h( 75)  =  2,  and  so  key  75  hashes  to  slot  2  of  table  T.  A  secondary  hash 
table  Sj  stores  all  keys  hashing  to  slot  j .  The  size  of  hash  table  Sj  is  mj  =  nj,  and  the  associated 
hash  function  is  hj  (k )  =  ((ajk  +  bj )  mod  p)  mod  mj .  Since  /12  (75)  =  7,  key  75  is  stored  in  slot  7 
of  secondary  hash  table  S2.  No  collisions  occur  in  any  of  the  secondary  hash  tables,  and  so  searching 
takes  constant  time  in  the  worst  case. 

call  a  hashing  technique  perfect  hashing  if  0(1)  memory  accesses  are  required  to 
perform  a  search  in  the  worst  case. 

To  create  a  perfect  hashing  scheme,  we  use  two  levels  of  hashing,  with  universal 
hashing  at  each  level.  Figure  1 1 .6  illustrates  the  approach. 

The  first  level  is  essentially  the  same  as  for  hashing  with  chaining:  we  hash 
the  n  keys  into  m  slots  using  a  hash  function  h  carefully  selected  from  a  family  of 
universal  hash  functions. 

Instead  of  making  a  linked  list  of  the  keys  hashing  to  slot  j ,  however,  we  use  a 
small  secondary  hash  table  Sj  with  an  associated  hash  function  hj.  By  choosing 
the  hash  functions  hj  carefully,  we  can  guarantee  that  there  are  no  collisions  at  the 
secondary  level. 

In  order  to  guarantee  that  there  are  no  collisions  at  the  secondary  level,  however, 
we  will  need  to  let  the  size  mj  of  hash  table  Sj  be  the  square  of  the  number  nj  of 
keys  hashing  to  slot  j .  Although  you  might  think  that  the  quadratic  dependence 
of  ntj  on  nj  may  seem  likely  to  cause  the  overall  storage  requirement  to  be  exces¬ 
sive,  we  shall  show  that  by  choosing  the  first-level  hash  function  well,  we  can  limit 
the  expected  total  amount  of  space  used  to  O(n). 

We  use  hash  functions  chosen  from  the  universal  classes  of  hash  functions  of 
Section  11.3.3.  The  first- level  hash  function  comes  from  the  class  Jfpm,  where  as 
in  Section  11.3.3,  p  is  a  prime  number  greater  than  any  key  value.  Those  keys 
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hashing  to  slot  j  are  re-hashed  into  a  secondary  hash  table  Sj  of  size  m,-  using  a 
hash  function  hj  chosen  from  the  class  Mp^mj . 1 

We  shall  proceed  in  two  steps.  First,  we  shall  determine  how  to  ensure  that 
the  secondary  tables  have  no  collisions.  Second,  we  shall  show  that  the  expected 
amount  of  memory  used  overall— for  the  primary  hash  table  and  all  the  secondary 
hash  tables— is  O(n). 

Theorem  11.9 

Suppose  that  we  store  n  keys  in  a  hash  table  of  size  m  =  n2  using  a  hash  function  h 
randomly  chosen  from  a  universal  class  of  hash  functions.  Then,  the  probability  is 
less  than  1/2  that  there  are  any  collisions. 

Proof  There  are  (")  pairs  of  keys  that  may  collide;  each  pair  collides  with  prob¬ 
ability  1  / m  if  h  is  chosen  at  random  from  a  universal  family  3i  of  hash  functions. 
Let  X  be  a  random  variable  that  counts  the  number  of  collisions.  When  m  =  n2, 
the  expected  number  of  collisions  is 


E[X] 


n2  —  n  1 


2  n 2 


<  1/2 . 

(This  analysis  is  similar  to  the  analysis  of  the  birthday  paradox  in  Section  5.4.1.) 
Applying  Markov’s  inequality  (C.30),  Pr{X  >  t)  <  E  [X]  / 1 ,  with  t  =  1,  com¬ 


pletes  the  proof. 


In  the  situation  described  in  Theorem  1 1.9,  where  m  =  n2,  it  follows  that  a  hash 
function  h  chosen  at  random  from  M  is  more  likely  than  not  to  have  no  collisions. 
Given  the  set  K  of  n  keys  to  be  hashed  (remember  that  K  is  static),  it  is  thus  easy 
to  find  a  collision-free  hash  function  h  with  a  few  random  trials. 

When  n  is  large,  however,  a  hash  table  of  size  m  =  n2  is  excessive.  Therefore, 
we  adopt  the  two-level  hashing  approach,  and  we  use  the  approach  of  Theorem  1 1.9 
only  to  hash  the  entries  within  each  slot.  We  use  an  outer,  or  first-level,  hash 
function  h  to  hash  the  keys  into  m  =  n  slots.  Then,  if  itj  keys  hash  to  slot  j ,  we 
use  a  secondary  hash  table  Sj  of  size  mj  =  n2  to  provide  collision-free  constant¬ 
time  lookup. 


1When  nj  =  mj  =  1,  we  don’t  really  need  a  hash  function  for  slot  j ;  when  we  choose  a  hash 

function  haj,  ( k )  =  ((ak  +  b )  mod  p)  mod  mj  for  such  a  slot,  we  just  use  a  =  b  =  0. 


280 


Chapter  11  Hash  Tables 


We  now  turn  to  the  issue  of  ensuring  that  the  overall  memory  used  is  O(n). 
Since  the  size  mj  of  the  j  th  secondary  hash  table  grows  quadratically  with  the 
number  rij  of  keys  stored,  we  run  the  risk  that  the  overall  amount  of  storage  could 
be  excessive. 

If  the  first-level  table  size  is  m  =  n,  then  the  amount  of  memory  used  is  O(n) 
for  the  primary  hash  table,  for  the  storage  of  the  sizes  nij  of  the  secondary  hash 
tables,  and  for  the  storage  of  the  parameters  a,  and  bj  defining  the  secondary  hash 
functions  hj  drawn  from  the  class  of  Section  11.3.3  (except  when  rij  =  1 

and  we  use  a  =  b  =  0).  The  following  theorem  and  a  corollary  provide  abound  on 
the  expected  combined  sizes  of  all  the  secondary  hash  tables.  A  second  corollary 
bounds  the  probability  that  the  combined  size  of  all  the  secondary  hash  tables  is 
superlinear  (actually,  that  it  equals  or  exceeds  An). 

Theorem  11.10 

Suppose  that  we  store  n  keys  in  a  hash  table  of  size  m  =  n  using  a  hash  function  h 
randomly  chosen  from  a  universal  class  of  hash  functions.  Then,  we  have 


E 


where  nj  is  the  number  of  keys  hashing  to  slot  j . 

Proof  We  start  with  the  following  identity,  which  holds  for  any  nonnegative  inte¬ 
ger  a: 


(11.6) 


We  have 


(by  equation  (1 1.6)) 


(by  linearity  of  expectation) 


(by  equation  (11.1)) 


11.5  Perfect  hashing 


281 


=  n  +  2E 


(since  n  is  not  a  random  variable)  . 


To  evaluate  the  summation  Yl7=o  ("2)’  we  observe  that  it  is  just  the  total  number 
of  pairs  of  keys  in  the  hash  table  that  collide.  By  the  properties  of  universal  hashing, 
the  expected  value  of  this  summation  is  at  most 


(n\  1  n(n  —  1) 

l 2 J  m  2m 

n  —  1 
~2~’ 

since  m  =  n.  Thus, 


E 


m— 1 

E«J 


U=o 


n  —  1 

<  n  +  2  — 

=  2/7-1 

<  2/7  . 


Corollary  11.11 

Suppose  that  we  store  //  keys  in  a  hash  table  of  size  m  =  n  using  a  hash  func¬ 
tion  h  randomly  chosen  from  a  universal  class  of  hash  functions,  and  we  set  the 

size  of  each  secondary  hash  table  to  mj  =  nj  for  j  =  0,  1 . m  —  1.  Then, 

the  expected  amount  of  storage  required  for  all  secondary  hash  tables  in  a  perfect 
hashing  scheme  is  less  than  2/7. 


Proof  Since  mj  =  nj  for  j  =  0,  1, . . . 


E 

1 

IW 
1 _ 

=  E 

~m— 1 

U=o  J 

<  2/7 

.7=0  _ 

' 

which  completes  the  proof. 


m  —  1,  Theorem  11.10  gives 


(11.7) 


Corollary  11.12 

Suppose  that  we  store  n  keys  in  a  hash  table  of  size  m  =  //  using  a  hash  function  h 
randomly  chosen  from  a  universal  class  of  hash  functions,  and  we  set  the  size 
of  each  secondary  hash  table  to  ///,  =  nj  for  j  =  0.  1 .... ,  in  —  1 .  Then,  the 
probability  is  less  than  1/2  that  the  total  storage  used  for  secondary  hash  tables 
equals  or  exceeds  4/7 . 
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Proof  Again  we  apply  Markov’s  inequality  (C.30),  Pr  {X  >  t}  <  E  [X]  / 1,  this 
time  to  inequality  (11.7),  with  X  =  mi  anc* 1  = 


Pr 


2/7 

4/7 


< 


1/2. 


From  Corollary  11.12,  we  see  that  if  we  test  a  few  randomly  chosen  hash  func¬ 
tions  from  the  universal  family,  we  will  quickly  find  one  that  uses  a  reasonable 
amount  of  storage. 

Exercises 
11.5-1  * 

Suppose  that  we  insert  n  keys  into  a  hash  table  of  size  in  using  open  addressing 
and  uniform  hashing.  Let  p(n ,  //?)  be  the  probability  that  no  collisions  occur.  Show 
that  p(n,m)  <  e~nln-^l2m _  ( Hint :  See  equation  (3.12).)  Argue  that  when  n  ex¬ 
ceeds  y/m,  the  probability  of  avoiding  collisions  goes  rapidly  to  zero. 


Problems 


11-1  Longest-probe  bound  for  hashing 

Suppose  that  we  use  an  open-addressed  hash  table  of  size  m  to  store  n  <  m/1 
items. 

a.  Assuming  uniform  hashing,  show  that  for  /  =  1, 2, the  probability  is  at 
most  2~k  that  the  i  th  insertion  requires  strictly  more  than  k  probes. 

b.  Show  that  for  i  =  1,2,...,  n,  the  probability  is  0(l/n2)  that  the  z th  insertion 
requires  more  than  2  lg  n  probes. 

Let  the  random  variable  Xt  denote  the  number  of  probes  required  by  the  i  th  inser¬ 
tion.  You  have  shown  in  pail  (b)  that  Pr  { Xt  >  2  lg  n}  =  0(1/ n2).  Let  the  random 
variable  X  =  max  |<;  <„  X,  denote  the  maximum  number  of  probes  required  by 
any  of  the  n  insertions. 

c.  Show  that  Pr{Y  >  2  lg  //}  =  0(1/ n). 

d.  Show  that  the  expected  length  E  [X]  of  the  longest  probe  sequence  is  0(lgn). 


Problems  for  Chapter  1 1 
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11-2  Slot-size  bound  for  chaining 

Suppose  that  we  have  a  hash  table  with  n  slots,  with  collisions  resolved  by  chain¬ 
ing,  and  suppose  that  n  keys  are  inserted  into  the  table.  Each  key  is  equally  likely 
to  be  hashed  to  each  slot.  Let  M  be  the  maximum  number  of  keys  in  any  slot  after 
all  the  keys  have  been  inserted.  Your  mission  is  to  prove  an  0( lg  n /  lg  lg  n)  upper 
bound  on  E  [M],  the  expected  value  of  M . 

a.  Argue  that  the  probability  Qk  that  exactly  k  keys  hash  to  a  particular  slot  is 
given  by 


b.  Let  Pk  be  the  probability  that  M  =  k,  that  is,  the  probability  that  the  slot 
containing  the  most  keys  contains  k  keys.  Show  that  Pk  <  nQk- 

c.  Use  Stirling’s  approximation,  equation  (3.18),  to  show  that  Qk  <  ek  / kk . 


d.  Show  that  there  exists  a  constant  c  >  1  such  that  Qk0  <  1/zz3  for  k0  = 
clg/z/lglg/z.  Conclude  that  Pk  <  1  /n2  for/:  >  k0  =  clgzz/lglgzz. 


e.  Argue  that 

E  [M]  <Pr \M  > 


■  n  +  Pr  < 


c  lg  n 
lg  lg  n 


Conclude  that  E  [M]  =  0(lg  n /  lg  lg  «). 


c  lg  n 
lg  lg  n 


clg  n 
lg  lg  «  ' 


11-3  Quadratic  probing 

Suppose  that  we  are  given  a  key  k  to  search  for  in  a  hash  table  with  positions 
0, 1, . . . ,  m  —  1,  and  suppose  that  we  have  a  hash  function  h  mapping  the  key  space 
into  the  set  {0,  1 . m  —  1}.  The  search  scheme  is  as  follows: 

1.  Compute  the  value  j  =  /?(/:),  and  set  i  =  0. 

2.  Probe  in  position  j  for  the  desired  key  k.  If  you  find  it,  or  if  this  position  is 
empty,  terminate  the  search. 

3.  Set  i  =  i  +  1.  If  i  now  equals  m,  the  table  is  full,  so  terminate  the  search. 
Otherwise,  set  j  =  (z  +  j  )  mod  m,  and  return  to  step  2. 

Assume  that  m  is  a  power  of  2. 

a.  Show  that  this  scheme  is  an  instance  of  the  general  “quadratic  probing”  scheme 
by  exhibiting  the  appropriate  constants  C\  and  c2  for  equation  (11.5). 

b.  Prove  that  this  algorithm  examines  every  table  position  in  the  worst  case. 
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11-4  Hashing  and  authentication 

Let  be  a  class  of  hash  functions  in  which  each  hash  function  h  €  M  maps  the 
universe  U  of  keys  to  {0, 1, . . . ,  m  —  1}.  We  say  that  is  k -universal  if,  for  every 
fixed  sequence  of  k  distinct  keys  (x(1\  x(2\  . . . ,  x(k))  and  for  any  h  chosen  at 
random  from  J£,  the  sequence  {//(x(l)).  /;(x(2)), . . . ,  h{x(k)])  is  equally  likely  to  be 
any  of  the  mk  sequences  of  length  k  with  elements  drawn  from  {0, 1, . . . ,  m  —  1}. 

a.  Show  that  if  the  family  M  of  hash  functions  is  2-universal,  then  it  is  universal. 

b.  Suppose  that  the  universe  U  is  the  set  of  /i-tuples  of  values  drawn  from 

Zp  =  {0,1 . p—  1},  where  p  is  prime.  Consider  an  element  x  = 

(x0,Xi, . . .  ,x„_i)  S  U.  For  any  n -tuple  a  =  (a0,  a\,  . . . ,  an-i)  S  U,  de¬ 
fine  the  hash  function  ha  by 


Let  31  =  {ha}.  Show  that  M  is  universal,  but  not  2-universal.  {Hint:  Find  a  key 
for  which  all  hash  functions  in  M  produce  the  same  value.) 

c.  Suppose  that  we  modify  Jt  slightly  from  pail  (b):  for  any  a  e  U  and  for  any 
b  €  Zp,  define 


and  3V  =  {h'ah }.  Argue  that  3i'  is  2-universal.  {Hint:  Consider  fixed  77 -tuples 
x  €  U  and  y  €  U,  with  x,  ^  yI  for  some  i.  What  happens  to  h'ah{x) 
and  h'ab(y)  as  a,  and  b  range  over  Zpl) 

d.  Suppose  that  Alice  and  Bob  secretly  agree  on  a  hash  function  h  from  a 
2-universal  family  M  of  hash  functions.  Each  h  e  M  maps  from  a  universe  of 
keys  U  to  Zp ,  where  p  is  prime.  Later,  Alice  sends  a  message  m  to  Bob  over  the 
Internet,  where  m  €  U .  She  authenticates  this  message  to  Bob  by  also  sending 
an  authentication  tag  t  =  h(m),  and  Bob  checks  that  the  pair  {m,  t )  he  receives 
indeed  satisfies  t  =  h(m).  Suppose  that  an  adversary  intercepts  (777,  t)  en  route 
and  tries  to  fool  Bob  by  replacing  the  pair  (m,  t)  with  a  different  pair  {in' .  t'). 
Argue  that  the  probability  that  the  adversary  succeeds  in  fooling  Bob  into  ac¬ 
cepting  (777',  t')  is  at  most  1  //;,  no  matter  how  much  computing  power  the  ad¬ 
versary  has,  and  even  if  the  adversary  knows  the  family  Jt  of  hash  functions 
used. 


Notes  for  Chapter  11 
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Chapter  notes 

Knuth  [211]  and  Gonnet  [145]  are  excellent  references  for  the  analysis  of  hash¬ 
ing  algorithms.  Knuth  credits  H.  P.  Luhn  (1953)  for  inventing  hash  tables,  along 
with  the  chaining  method  for  resolving  collisions.  At  about  the  same  time,  G.  M. 
Amdahl  originated  the  idea  of  open  addressing. 

Carter  and  Wegman  introduced  the  notion  of  universal  classes  of  hash  functions 
in  1979  [58], 

Fredman,  Komlos,  and  Szemeredi  [112]  developed  the  perfect  hashing  scheme 
for  static  sets  presented  in  Section  11.5.  An  extension  of  their  method  to  dynamic 
sets,  handling  insertions  and  deletions  in  amortized  expected  time  (9(1),  has  been 
given  by  Dietzfelbinger  et  al.  [86]. 
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Binary  Search  Trees 


The  search  tree  data  structure  supports  many  dynamic-set  operations,  including 
Search,  Minimum,  Maximum,  Predecessor,  Successor,  Insert,  and 
Delete.  Thus,  we  can  use  a  search  tree  both  as  a  dictionary  and  as  a  priority 
queue. 

Basic  operations  on  a  binary  search  tree  take  time  proportional  to  the  height  of 
the  tree.  For  a  complete  binary  tree  with  n  nodes,  such  operations  run  in  O(lgn) 
worst-case  time.  If  the  tree  is  a  1  inear  chain  of  n  nodes,  however,  the  same  oper¬ 
ations  take  0(/t)  worst-case  time.  We  shall  see  in  Section  12.4  that  the  expected 
height  of  a  randomly  built  binary  search  tree  is  0(lg  n),  so  that  basic  dynamic-set 
operations  on  such  a  tree  take  0 ( 1  g  n)  time  on  average. 

In  practice,  we  can’t  always  guarantee  that  binary  search  trees  are  built  ran¬ 
domly,  but  we  can  design  variations  of  binary  search  trees  with  good  guaranteed 
worst-case  performance  on  basic  operations.  Chapter  13  presents  one  such  vari¬ 
ation,  red-black  trees,  which  have  height  0(lgn).  Chapter  18  introduces  B-trees, 
which  are  particularly  good  for  maintaining  databases  on  secondary  (disk)  storage. 

After  presenting  the  basic  properties  of  binary  search  trees,  the  following  sec¬ 
tions  show  how  to  walk  a  binary  search  tree  to  print  its  values  in  sorted  order,  how 
to  search  for  a  value  in  a  binary  search  tree,  how  to  find  the  minimum  or  maximum 
element,  how  to  find  the  predecessor  or  successor  of  an  element,  and  how  to  insert 
into  or  delete  from  a  binary  search  tree.  The  basic  mathematical  properties  of  trees 
appear-  in  Appendix  B. 


12.1  What  is  a  binary  search  tree? 

A  binary  search  tree  is  organized,  as  the  name  suggests,  in  a  binary  tree,  as  shown 
in  Figure  12.1.  We  can  represent  such  a  tree  by  a  linked  data  structure  in  which 
each  node  is  an  object.  In  addition  to  a  key  and  satellite  data,  each  node  contains 
attributes  left,  right,  and  p  that  point  to  the  nodes  corresponding  to  its  left  child, 
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Figure  12. 1  Binary  search  trees.  For  any  node  .x ,  the  keys  in  the  left  subtree  of  x  are  at  most  x .  key, 
and  the  keys  in  the  right  subtree  of  x  are  at  least  x.key.  Different  binary  search  trees  can  represent 
the  same  set  of  values.  The  worst  case  running  time  for  most  search  tree  operations  is  proportional 
to  the  height  of  the  tree,  (a)  A  binary  search  tree  on  6  nodes  with  height  2.  (b )  A  less  efficient  binary 
search  tree  with  height  4  that  contains  the  same  keys. 

its  right  child,  and  its  parent,  respectively.  If  a  child  or  the  parent  is  missing,  the 
appropriate  attribute  contains  the  value  NIL.  The  root  node  is  the  only  node  in  the 
tree  whose  parent  is  nil. 

The  keys  in  a  binary  search  tree  are  always  stored  in  such  a  way  as  to  satisfy  the 
binary-search-tree  property. 

Let  x  be  a  node  in  a  binary  search  tree.  If  y  is  a  node  in  the  left  subtree 
of  x,  then  y.key  <  x.key.  If  y  is  a  node  in  the  right  subtree  of  x,  then 
y .key  >  x.key. 

Thus,  in  Figure  12.1(a),  the  key  of  the  root  is  6,  the  keys  2,  5,  and  5  in  its  left 
subtree  are  no  larger  than  6,  and  the  keys  7  and  8  in  its  right  subtree  are  no  smaller 
than  6.  The  same  property  holds  for  every  node  in  the  tree.  For  example,  the  key  5 
in  the  root’s  left  child  is  no  smaller  than  the  key  2  in  that  node’s  left  subtree  and  no 
larger  than  the  key  5  in  the  right  subtree. 

The  binary-search-tree  property  allows  us  to  print  out  all  the  keys  in  a  binary 
search  tree  in  sorted  order  by  a  simple  recursive  algorithm,  called  an  inorder  tree 
walk.  This  algorithm  is  so  named  because  it  prints  the  key  of  the  root  of  a  subtree 
between  printing  the  values  in  its  left  subtree  and  printing  those  in  its  right  subtree. 
(Similarly,  a  preorder  tree  walk  prints  the  root  before  the  values  in  either  subtree, 
and  a  postorder  tree  walk  prints  the  root  after  the  values  in  its  subtrees.)  To  use 
the  following  procedure  to  print  all  the  elements  in  a  binary  search  tree  T,  we  call 

Inorder-Tree-Walk(7’.too0- 
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Inorder-Tree- Walk  (x) 

1  if  X  ^  NIL 

2  Inorder-Tree-Walk(x.Z^) 

3  print  x.key 

4  Inorder-Tree- Walk  (x. right) 

As  an  example,  the  inorder  tree  walk  prints  the  keys  in  each  of  the  two  binary 
search  trees  from  Figure  12.1  in  the  order  2,  5,  5,  6,  7,  8.  The  correctness  of  the 
algorithm  follows  by  induction  directly  from  the  binary-search-tree  property. 

It  takes  0(7?)  time  to  walk  an  /i-node  binary  search  tree,  since  after  the  ini¬ 
tial  call,  the  procedure  calls  itself  recursively  exactly  twice  for  each  node  in  the 
tree— once  for  its  left  child  and  once  for  its  right  child.  The  following  theorem 
gives  a  formal  proof  that  it  takes  linear  time  to  perform  an  inorder  tree  walk. 

Theorem  12.1 

If  x  is  the  root  of  an  /i-node  subtree,  then  the  call  Inorder-Tree- Walk (x) 
takes  0(t?)  time. 

Proof  Let  T(n)  denote  the  time  taken  by  Inorder-Tree- Walk  when  it  is 
called  on  the  root  of  an  //-node  subtree.  Since  Inorder-Tree-Walk  visits  all  n 
nodes  of  the  subtree,  we  have  T(n)  =  Q(n).  It  remains  to  show  that  T (//)  =  O(n). 

Since  Inorder-Tree- Walk  takes  a  small,  constant  amount  of  time  on  an 
empty  subtree  (for  the  test  x  f  NIL),  we  have  770)  =  c  for  some  constant  c  >  0. 

For  n  >  0,  suppose  that  Inorder-Tree-Walk  is  called  on  a  node  x  whose 
left  subtree  has  k  nodes  and  whose  right  subtree  has  n  —  k  —  1  nodes.  The  time  to 
perform  Inorder-Tree-Walk(x)  is  bounded  by  T(n)  <  T(k)+T(n—k—\)+d 
for  some  constant  d  >  0  that  reflects  an  upper  bound  on  the  time  to  execute  the 
body  of  Inorder-Tree-Walk  (x),  exclusive  of  the  time  spent  in  recursive  calls. 

We  use  the  substitution  method  to  show  that  T(n)  =  0{n)  by  proving  that 
T (n)  <  (c  +  d)n  +  c.  For  n  =  0,  we  have  ( c  +  d)-0  +  c  =  c  =  T (0).  For  n  >  0, 
we  have 

T{n)  <  T(k)  +  T{n  -  k  -  1)  +  d 

=  ((c  +  d)k  +  c)  +  ((c  +  d)(n  —  k  —  1)  +  c)  +  d 
=  (c  +  d)n  +  c  —  (c  +  d)  +  c  +  d 
=  (c  +  d)n  +  c  , 

which  completes  the  proof.  ■ 
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Exercises 


12.1-1 

For  the  set  of  { 1 , 4,  5, 10, 16, 17,  21 }  of  keys,  draw  binary  search  trees  of  heights  2, 
3,  4,  5,  and  6. 

12.1-2 

What  is  the  difference  between  the  binary-search-tree  property  and  the  min-heap 
property  (see  page  153)?  Can  the  min-heap  property  be  used  to  print  out  the  keys 
of  an  «-node  tree  in  sorted  order  in  O(n  )  time?  Show  how,  or  explain  why  not. 


12.1-3 

Give  a  nonrecursive  algorithm  that  performs  an  inorder  tree  walk.  {Hint:  An  easy 
solution  uses  a  stack  as  an  auxiliary  data  structure.  A  more  complicated,  but  ele¬ 
gant,  solution  uses  no  stack  but  assumes  that  we  can  test  two  pointers  for  equality.) 


12.1-4 

Give  recursive  algorithms  that  perform  preorder  and  postorder  tree  walks  in  0(«) 
time  on  a  tree  of  n  nodes. 


12.1-5 

Argue  that  since  sorting  n  elements  takes  Q(n  Ig  n)  time  in  the  worst  case  in 
the  comparison  model,  any  comparison-based  algorithm  for  constructing  a  binary 
search  tree  from  an  arbitrary  list  of  n  elements  takes  £2(«  lg  n)  time  in  the  worst 
case. 


12.2  Querying  a  binary  search  tree 

We  often  need  to  search  for  a  key  stored  in  a  binary  search  tree.  Besides  the 
Search  operation,  binary  search  trees  can  support  such  queries  as  Minimum, 
Maximum,  Successor,  and  Predecessor.  In  this  section,  we  shall  examine 
these  operations  and  show  how  to  support  each  one  in  time  0(h)  on  any  binary 
search  tree  of  height  h. 

Searching 

We  use  the  following  procedure  to  search  for  a  node  with  a  given  key  in  a  binary 
search  tree.  Given  a  pointer  to  the  root  of  the  tree  and  a  key  k,  Tree-Search 
returns  a  pointer  to  a  node  with  key  k  if  one  exists;  otherwise,  it  returns  NIL. 
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Figure  12.2  Queries  on  a  binary  search  tree.  To  search  for  the  key  1 3  in  the  tree,  we  follow  the  path 
15  — ►  6  — ►  7  — y  13  from  the  root.  The  minimum  key  in  the  tree  is  2,  which  is  found  by  following 
left  pointers  from  the  root.  The  maximum  key  20  is  found  by  following  right  pointers  from  the  root. 
The  successor  of  the  node  with  key  15  is  the  node  with  key  17,  since  it  is  the  minimum  key  in  the 
right  subtree  of  15.  The  node  with  key  13  has  no  right  subtree,  and  thus  its  successor  is  its  lowest 
ancestor  whose  left  child  is  also  an  ancestor.  In  this  case,  the  node  with  key  15  is  its  successor. 


Tree-Search  (x,  k) 

1  if  x  ==  nil  or  k  ==  x.key 

2  return  x 

3  if  k  <  x.key 

4  return  TREE-SEARCH  (a.  left,  k) 

5  else  return  TREE-SEARCH  (a.  right,  k) 

The  procedure  begins  its  search  at  the  root  and  traces  a  simple  path  downward  in 
the  tree,  as  shown  in  Figure  12.2.  For  each  node  x  it  encounters,  it  compares  the 
key  k  with  x.key.  If  the  two  keys  are  equal,  the  search  terminates.  If  k  is  smaller 
than  x.key ,  the  search  continues  in  the  left  subtree  of  x,  since  the  binary-search- 
tree  property  implies  that  k  could  not  be  stored  in  the  right  subtree.  Symmetrically, 
if  k  is  larger  than  x.key,  the  search  continues  in  the  right  subtree.  The  nodes 
encountered  during  the  recursion  form  a  simple  path  downward  from  the  root  of 
the  tree,  and  thus  the  running  time  of  TREE-SEARCH  is  O(h),  where  h  is  the  height 
of  the  tree. 

We  can  rewrite  this  procedure  in  an  iterative  fashion  by  “unrolling”  the  recursion 
into  a  while  loop.  On  most  computers,  the  iterative  version  is  more  efficient. 


12.2  Querying  a  binary  search  tree 


291 


Iterative-Tree-Search(x,  k) 

1  while  x  ^  NIL  and  k  ^  x .  key 

2  if  k  <  x .  key 

3  x  —  x.left 

4  else  x  =  x. right 

5  return  x 


Minimum  and  maximum 

We  can  always  find  an  element  in  a  binary  search  tree  whose  key  is  a  minimum  by 
following  left  child  pointers  from  the  root  until  we  encounter  a  NIL,  as  shown  in 
Figure  12.2.  The  following  procedure  returns  a  pointer  to  the  minimum  element  in 
the  subtree  rooted  at  a  given  node  x,  which  we  assume  to  be  non-NlL: 

Tree-Minimum  (x) 

1  while  x.left  NIL 

2  x  =  x.left 

3  return  x 

The  binary-search-tree  property  guarantees  that  Tree-Minimum  is  correct.  If  a 
node  x  has  no  left  subtree,  then  since  every  key  in  the  right  subtree  of  x  is  at  least  as 
large  as  x.key,  the  minimum  key  in  the  subtree  rooted  at  x  is  x.key.  If  node  x  has 
a  left  subtree,  then  since  no  key  in  the  right  subtree  is  smaller  than  x.key  and  every 
key  in  the  left  subtree  is  not  larger  than  x.key,  the  minimum  key  in  the  subtree 
rooted  at  x  resides  in  the  subtree  rooted  at  x.left. 

The  pseudocode  for  Tree-Maximum  is  symmetric: 

Tree-Maximum  (x) 

1  while  x .  right  f  NIL 

2  x  =  x. right 

3  return  x 

Both  of  these  procedures  run  in  0(h)  time  on  a  tree  of  height  h  since,  as  in  Tree- 
Search,  the  sequence  of  nodes  encountered  forms  a  simple  path  downward  from 
the  root. 

Successor  and  predecessor 

Given  a  node  in  a  binary  search  tree,  sometimes  we  need  to  find  its  successor  in 
the  sorted  order  determined  by  an  inorder  tree  walk.  If  all  keys  are  distinct,  the 
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successor  of  a  node  x  is  the  node  with  the  smallest  key  greater  than  x.key.  The 
structure  of  a  binary  search  tree  allows  us  to  determine  the  successor  of  a  node 
without  ever  comparing  keys.  The  following  procedure  returns  the  successor  of  a 
node  x  in  a  binary  search  tree  if  it  exists,  and  NIL  if  x  has  the  largest  key  in  the 
tree: 

Tree-Successor(x) 

1  if  x. right  ^  NIL 

2  return  Tree-Minimum (x.  right) 

3  y  =  x.p 

4  while  y  ^  NIL  and  x  ==  y .right 

5  x  =  y 

6  y  =  y.p 

7  return  y 

We  break  the  code  for  TREE-SUCCESSOR  into  two  cases.  If  the  right  subtree 
of  node  x  is  nonempty,  then  the  successor  of  x  is  just  the  leftmost  node  in  x’s 
right  subtree,  which  we  find  in  line  2  by  calling  Tree-Minimum (x. right).  For 
example,  the  successor  of  the  node  with  key  15  in  Figure  12.2  is  the  node  with 
key  17. 

On  the  other  hand,  as  Exercise  12.2-6  asks  you  to  show,  if  the  right  subtree  of 
node  x  is  empty  and  x  has  a  successor  y,  then  y  is  the  lowest  ancestor  of  x  whose 
left  child  is  also  an  ancestor  of  x.  In  Figure  12.2,  the  successor  of  the  node  with 
key  13  is  the  node  with  key  15.  To  find  y,  we  simply  go  up  the  tree  from  x  until  we 
encounter  a  node  that  is  the  left  child  of  its  parent;  lines  3-7  of  TREE-SUCCESSOR 
handle  this  case. 

The  running  time  of  TREE-SUCCESSOR  on  a  tree  of  height  h  is  0(h),  since  we 
either  follow  a  simple  path  up  the  tree  or  follow  a  simple  path  down  the  tree.  The 
procedure  Tree-Predecessor,  which  is  symmetric  to  Tree-Successor,  also 
runs  in  time  0(h). 

Even  if  keys  are  not  distinct,  we  define  the  successor  and  predecessor  of  any 
node  x  as  the  node  returned  by  calls  made  to  TREE-SUCCESSOR (x)  and  Tree- 
Predecessor(x),  respectively. 

In  summary,  we  have  proved  the  following  theorem. 

Theorem  12.2 

We  can  implement  the  dynamic-set  operations  Search,  Minimum,  Maximum, 
Successor,  and  Predecessor  so  that  each  one  runs  in  0(h)  time  on  a  binary 
search  tree  of  height  h.  m 


12.2  Querying  a  binary  search  tree 
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Exercises 


12.2-1 

Suppose  that  we  have  numbers  between  1  and  1000  in  a  binary  search  tree,  and  we 
want  to  search  for  the  number  363.  Which  of  the  following  sequences  could  not  be 
the  sequence  of  nodes  examined? 

a.  2,  252,  401,  398,  330,  344,  397,  363. 

b.  924,  220,  911,  244,  898,  258,  362,  363. 

c.  925,202,911,240,912,245,363. 

d.  2,  399,  387,  219,  266,  382,  381,  278,  363. 

e.  935,  278,  347,  621,  299,  392,  358,  363. 


12.2-2 

Write  recursive  versions  of  Tree-Minimum  and  Tree-Maximum. 


12.2-3 

Write  the  Tree-Predecessor  procedure. 


12.2-4 

Professor  Bunyan  thinks  he  has  discovered  a  remarkable  property  of  binary  search 
trees.  Suppose  that  the  search  for  key  k  in  a  binary  search  tree  ends  up  in  a  leaf. 
Consider  three  sets:  A,  the  keys  to  the  left  of  the  search  path;  B,  the  keys  on  the 
search  path;  and  C,  the  keys  to  the  right  of  the  search  path.  Professor  Bunyan 
claims  that  any  three  keys  a  e  A,  b  e  B ,  and  c  e  C  must  satisfy  a  <  b  <  c.  Give 
a  smallest  possible  counterexample  to  the  professor’s  claim. 


12.2-5 

Show  that  if  a  node  in  a  binary  search  tree  has  two  children,  then  its  successor  has 
no  left  child  and  its  predecessor  has  no  right  child. 


12.2-6 

Consider  a  binary  search  tree  T  whose  keys  are  distinct.  Show  that  if  the  right 
subtree  of  a  node  x  in  T  is  empty  and  x  has  a  successor  y,  then  y  is  the  lowest 
ancestor  of  x  whose  left  child  is  also  an  ancestor  of  x.  (Recall  that  every  node  is 
its  own  ancestor.) 


12.2-7 

An  alternative  method  of  performing  an  inorder  tree  walk  of  an  n-node  binary 
search  tree  finds  the  minimum  element  in  the  tree  by  calling  Tree-Minimum  and 
then  making  n  —  1  calls  to  TREE-SUCCESSOR.  Prove  that  this  algorithm  runs 
in  0(n)  time. 
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12.2-8 

Prove  that  no  matter  what  node  we  stall  at  in  a  height-/!  binary  search  tree,  k 
successive  calls  to  TREE-SUCCESSOR  take  0(k  +  h )  time. 


12.2-9 

Let  T  be  a  binary  search  tree  whose  keys  are  distinct,  let  x  be  a  leaf  node,  and  let  y 
be  its  parent.  Show  that  y .key  is  either  the  smallest  key  in  T  larger  than  x.key  or 
the  largest  key  in  T  smaller  than  x .  key. 


12.3  Insertion  and  deletion 

The  operations  of  insertion  and  deletion  cause  the  dynamic  set  represented  by  a 
binary  search  tree  to  change.  The  data  structure  must  be  modified  to  reflect  this 
change,  but  in  such  a  way  that  the  binary-search-tree  property  continues  to  hold. 
As  we  shall  see,  modifying  the  tree  to  insert  a  new  element  is  relatively  straight¬ 
forward,  but  handling  deletion  is  somewhat  more  intricate. 

Insertion 

To  insert  a  new  value  v  into  a  binary  search  tree  T,  we  use  the  procedure  Tree- 
Insert.  The  procedure  takes  a  node  z  for  which  z.key  =  v,  z.left  =  NIL, 
and  z.  right  =  NIL.  It  modifies  T  and  some  of  the  attributes  of  z  in  such  a  way  that 
it  inserts  z  into  an  appropriate  position  in  the  tree. 

Tree-Insert  (T,z) 

1  y  =  nil 

2  x  =  T.root 

3  while  x  ^  nil 

4  y  =  x 

5  if  z  ■  key  <  x .  key 

6  x  =  x.left 

7  else  x  =  x. right 

8  z.p  =  y 

9  if  y  ==  NIL 

10  T.root  =  z 

11  elseif  z.key  <  y.key 

12  y.left  =  z 

13  else  y.  right  =  z 


//  tree  T  was  empty 
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Figure  12 3  Inserting  an  item  with  key  13  into  a  binary  search  tree.  Lightly  shaded  nodes  indicate 
the  simple  path  from  the  root  down  to  the  position  where  the  item  is  inserted.  The  dashed  line 
indicates  the  link  in  the  tree  that  is  added  to  insert  the  item. 

Figure  12.3  shows  how  Tree-Insert  works.  Just  like  the  procedures  Tree- 
Search  and  Iterative-Tree-Search,  Tree-Insert  begins  at  the  root  of  the 
tree  and  the  pointer  x  traces  a  simple  path  downward  looking  for  a  NIL  to  replace 
with  the  input  item  Z-  The  procedure  maintains  the  trailing  pointer  y  as  the  parent 
of  x.  After  initialization,  the  while  loop  in  lines  3-7  causes  these  two  pointers 
to  move  down  the  tree,  going  left  or  right  depending  on  the  comparison  of  z-key 
with  x.key,  until  x  becomes  nil.  This  nil  occupies  the  position  where  we  wish  to 
place  the  input  item  z.  We  need  the  trailing  pointer  y,  because  by  the  time  we  find 
the  NIL  where  z  belongs,  the  search  has  proceeded  one  step  beyond  the  node  that 
needs  to  be  changed.  Lines  8-13  set  the  pointers  that  cause  z  to  be  inserted. 

Like  the  other  primitive  operations  on  search  trees,  the  procedure  Tree-Insert 
runs  in  0(h)  time  on  a  tree  of  height  h. 

Deletion 

The  overall  strategy  for  deleting  a  node  z  from  a  binary  search  tree  T  has  three 
basic  cases  but,  as  we  shall  see,  one  of  the  cases  is  a  bit  tricky. 

•  If  z  has  no  children,  then  we  simply  remove  it  by  modifying  its  parent  to  re¬ 
place  z  with  NIL  as  its  child. 

•  If  z  has  just  one  child,  then  we  elevate  that  child  to  take  z’s  position  in  the  tree 
by  modifying  z' s  parent  to  replace  z  by  z’s  child. 

•  If  z  has  two  children,  then  we  find  z’s  successor  y— which  must  be  in  z’s  right 
subtree— and  have  y  take  z’s  position  in  the  tree.  The  rest  of  z’s  original  right 
subtree  becomes  y’s  new  right  subtree,  and  z' s  left  subtree  becomes  y’s  new 
left  subtree.  This  case  is  the  tricky  one  because,  as  we  shall  see,  it  matters 
whether  y  is  z’s  right  child. 
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The  procedure  for  deleting  a  given  node  z  from  a  binary  search  tree  T  takes  as 
arguments  pointers  to  T  and  z.  It  organizes  its  cases  a  bit  differently  from  the  three 
cases  outlined  previously  by  considering  the  four  cases  shown  in  Figure  12.4. 

•  If  z  has  no  left  child  (part  (a)  of  the  figure),  then  we  replace  z  by  its  right  child, 
which  may  or  may  not  be  NIL.  When  z’s  right  child  is  NIL,  this  case  deals  with 
the  situation  in  which  z  has  no  children.  When  z’s  right  child  is  non-NlL,  this 
case  handles  the  situation  in  which  z  has  just  one  child,  which  is  its  right  child. 

•  If  z  has  just  one  child,  which  is  its  left  child  (part  (b)  of  the  figure),  then  we 
replace  z  by  its  left  child. 

•  Otherwise,  z  has  both  a  left  and  a  right  child.  We  find  z’s  successor  y,  which 
lies  in  z’s  right  subtree  and  has  no  left  child  (see  Exercise  12.2-5).  We  want  to 
splice  y  out  of  its  current  location  and  have  it  replace  z  in  the  tree. 

•  If  y  is  z’s  right  child  (part  (c)),  then  we  replace  z  by  y,  leaving  y’s  right 
child  alone. 

•  Otherwise,  y  lies  within  z’s  right  subtree  but  is  not  z’s  right  child  (part  (d)). 
In  this  case,  we  first  replace  y  by  its  own  right  child,  and  then  we  replace  z 

by  y. 

In  order  to  move  subtrees  around  within  the  binary  search  tree,  we  define  a 
subroutine  Transplant,  which  replaces  one  subtree  as  a  child  of  its  parent  with 
another  subtree.  When  Transplant  replaces  the  subtree  rooted  at  node  u  with 
the  subtree  rooted  at  node  v,  node  m’s  parent  becomes  node  v’s  parent,  and  w’s 
parent  ends  up  having  v  as  its  appropriate  child. 

Transplant (7)  w,  v) 

1  if  u.p  ==  NIL 

2  T.root  =  v 

3  elseif  u  ==  u.p. left 

4  u.p. left  =  v 

5  else  u.p. right  =  v 

6  if  v  NIL 

7  v.p  =  u.p 

Lines  1-2  handle  the  case  in  which  u  is  the  root  of  T .  Otherwise,  u  is  either  a  left 
child  or  a  right  child  of  its  parent.  Lines  3-4  take  care  of  updating  u.p. left  if  u 
is  a  left  child,  and  line  5  updates  u.p. right  if  u  is  a  right  child.  We  allow  v  to  be 
nil,  and  lines  6-7  update  v.p  if  v  is  non-NlL.  Note  that  Transplant  does  not 
attempt  to  update  v.left  and  v.  right',  doing  so,  or  not  doing  so,  is  the  responsibility 
of  Transplant’s  caller. 
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Figure  12.4  Deleting  a  node  z  from  a  binary  search  tree.  Node  z  may  be  the  root,  a  left  child  of 
node  q,  or  a  right  child  of  q.  (a)  Node  z  has  no  left  child.  We  replace  z  by  its  right  child  r,  which 
may  or  may  not  be  NIL.  (b)  Node  z  has  a  left  child  /  but  no  right  child.  We  replace  -  by  /.  (c)  Node  z 
has  two  children;  its  left  child  is  node  /.its  right  child  is  its  successor  y,  and  y’s  right  child  is  node  x. 
We  replace  z  by  y,  updating  y’s  left  child  to  become  /,  but  leaving  *  as  y’s  right  child,  (d)  Node  z 
has  two  children  (left  child  /  and  right  child  r),  and  its  successor  y  /  r  lies  within  the  subtree  rooted 
at  r.  We  replace  y  by  its  own  right  child  x,  and  we  set  y  to  be  r ’s  parent.  Then,  we  set  y  to  be  q's 
child  and  the  parent  of  /. 
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With  the  Transplant  procedure  in  hand,  here  is  the  procedure  that  deletes 
node  z  from  binary  search  tree  T : 

Tree-Delete  (T,z) 

1  if  z.left  ==  nil 

2  Transplant  ( T,z,z .  right) 

3  elseif  z  ■  right  =  =  NIL 

4  Transplant  (72  z,  z.left) 

5  else  y  =  Tree-Minimum (z.  right) 

6  if  y.p  ^  z 

7  Trans  plant  ( T,y,y .  right) 

8  y.  right  —  z.  right 

9  y.  right,  p  =  y 

10  Transplant (T,  z,  y) 

11  y-left  =  z.left 

12  y.left.p  =  y 

The  Tree-Delete  procedure  executes  the  four  cases  as  follows.  Lines  1-2 
handle  the  case  in  which  node  z  has  no  left  child,  and  lines  3-4  handle  the  case  in 
which  z  has  a  left  child  but  no  right  child.  Lines  5-12  deal  with  the  remaining  two 
cases,  in  which  z  has  two  children.  Line  5  finds  node  y,  which  is  the  successor 
of  z.  Because  z  has  a  nonempty  right  subtree,  its  successor  must  be  the  node  in 
that  subtree  with  the  smallest  key;  hence  the  call  to  Tree-Minimum (z. right).  As 
we  noted  before,  y  has  no  left  child.  We  want  to  splice  y  out  of  its  current  location, 
and  it  should  replace  z  in  the  tree.  If  y  is  z’s  right  child,  then  lines  10-12  replace  z 
as  a  child  of  its  parent  by  y  and  replace  y’s  left  child  by  z' s  left  child.  If  y  is 
not  z  ’s  left  child,  lines  7-9  replace  y  as  a  child  of  its  parent  by  y ’s  right  child  and 
turn  z' s  right  child  into  y’s  right  child,  and  then  lines  10-12  replace  z  as  a  child  of 
its  parent  by  y  and  replace  y ’s  left  child  by  z  ’s  left  child. 

Each  line  of  Tree-Delete,  including  the  calls  to  Transplant,  takes  constant 
time,  except  for  the  call  to  Tree-Minimum  in  line  5.  Thus,  Tree-Delete  runs 
in  0(h  )  time  on  a  tree  of  height  h. 

In  summary,  we  have  proved  the  following  theorem. 

Theorem  12.3 

We  can  implement  the  dynamic-set  operations  Insert  and  Delete  so  that  each 
one  runs  in  0(h)  time  on  a  binary  search  tree  of  height  h.  ■ 
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Exercises 


12.3-1 

Give  a  recursive  version  of  the  Tree-Insert  procedure. 


12.3-2 

Suppose  that  we  construct  a  binary  search  tree  by  repeatedly  inserting  distinct  val¬ 
ues  into  the  tree.  Argue  that  the  number  of  nodes  examined  in  searching  for  a 
value  in  the  tree  is  one  plus  the  number  of  nodes  examined  when  the  value  was 
first  inserted  into  the  tree. 


12.3-3 

We  can  sort  a  given  set  of  n  numbers  by  first  building  a  binary  search  tree  contain¬ 
ing  these  numbers  (using  Tree-Insert  repeatedly  to  insert  the  numbers  one  by 
one)  and  then  printing  the  numbers  by  an  inorder  tree  walk.  What  are  the  worst- 
case  and  best-case  running  times  for  this  sorting  algorithm? 


12.3-4 

Is  the  operation  of  deletion  “commutative”  in  the  sense  that  deleting  x  and  then  y 
from  a  binary  search  tree  leaves  the  same  tree  as  deleting  y  and  then  x?  Argue  why 
it  is  or  give  a  counterexample. 


12.3-5 

Suppose  that  instead  of  each  node  x  keeping  the  attribute  x.p,  pointing  to  x’s 
parent,  it  keeps  x.succ,  pointing  to  x’s  successor.  Give  pseudocode  for  Search, 
Insert,  and  Delete  on  a  binary  search  tree  T  using  this  representation.  These 
procedures  should  operate  in  time  0(h),  where  h  is  the  height  of  the  tree  T.  (Hint: 
You  may  wish  to  implement  a  subroutine  that  returns  the  parent  of  a  node.) 


12.3-6 

When  node  z  in  Tree-Delete  has  two  children,  we  could  choose  node  y  as 
its  predecessor  rather  than  its  successor.  What  other  changes  to  Tree-Delete 
would  be  necessary  if  we  did  so?  Some  have  argued  that  a  fair  strategy,  giving 
equal  priority  to  predecessor  and  successor,  yields  better  empirical  performance. 
How  might  Tree-Delete  be  changed  to  implement  such  a  fair  strategy? 


★  12.4  Randomly  built  binary  search  trees 

We  have  shown  that  each  of  the  basic  operations  on  a  binary  search  tree  runs 
in  0(h)  time,  where  h  is  the  height  of  the  tree.  The  height  of  a  binary  search 
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tree  varies,  however,  as  items  are  inserted  and  deleted.  If,  for  example,  the  n  items 
are  inserted  in  strictly  increasing  order,  the  tree  will  be  a  chain  with  height  n  —  1 . 
On  the  other  hand,  Exercise  B.5-4  shows  that  h  >  |_lg  n\ .  As  with  quicksort,  we 
can  show  that  the  behavior  of  the  average  case  is  much  closer  to  the  best  case  than 
to  the  worst  case. 

Unfortunately,  little  is  known  about  the  average  height  of  a  binary  search  tree 
when  both  insertion  and  deletion  are  used  to  create  it.  When  the  tree  is  created 
by  insertion  alone,  the  analysis  becomes  more  tractable.  Let  us  therefore  define  a 
randomly  built  binary  search  tree  on  n  keys  as  one  that  arises  from  inserting  the 
keys  in  random  order  into  an  initially  empty  tree,  where  each  of  the  n !  permutations 
of  the  input  keys  is  equally  likely.  (Exercise  12.4-3  asks  you  to  show  that  this  notion 
is  different  from  assuming  that  every  binary  search  tree  on  n  keys  is  equally  likely.) 
In  this  section,  we  shall  prove  the  following  theorem. 

Theorem  12.4 

The  expected  height  of  a  randomly  built  binary  search  tree  on  n  distinct  keys  is 
0(\gn). 

Proof  We  start  by  defining  three  random  variables  that  help  measure  the  height 
of  a  randomly  built  binary  search  tree.  We  denote  the  height  of  a  randomly  built 
binary  search  on  n  keys  by  Xn ,  and  we  define  the  exponential  height  Yn  =  2*". 
When  we  build  a  binary  search  tree  on  n  keys,  we  choose  one  key  as  that  of  the 
root,  and  we  let  Rn  denote  the  random  variable  that  holds  this  key’s  rank  within 
the  set  of  n  keys;  that  is,  Rn  holds  the  position  that  this  key  would  occupy  if  the 
set  of  keys  were  sorted.  The  value  of  Rn  is  equally  likely  to  be  any  element  of  the 
set  {1, 2, . . . ,  n }.  If  R„  =  i,  then  the  left  subtree  of  the  root  is  a  randomly  built 
binary  search  tree  on  i  —  1  keys,  and  the  right  subtree  is  a  randomly  built  binary 
search  tree  on  n  —  i  keys.  Because  the  height  of  a  binary  tree  is  1  more  than  the 
larger  of  the  heights  of  the  two  subtrees  of  the  root,  the  exponential  height  of  a 
binary  tree  is  twice  the  larger  of  the  exponential  heights  of  the  two  subtrees  of  the 
root.  If  we  know  that  Rn  —  i,  it  follows  that 

Yn  =  2-max(7;_1,T„_i)  . 

As  base  cases,  we  have  that  T,  =  1,  because  the  exponential  height  of  a  tree  with  1 
node  is  2°  =  1  and,  for  convenience,  we  define  F0  =  0. 

Next,  define  indicator  random  variables  ZnA ,  Z„;2,  •  ■  ■ ,  Znn,  where 

Zn,i  =  I {Rn  =  i}  . 

Because  Rn  is  equally  likely  to  be  any  element  of  {1,2, ...  ,n},  it  follows  that 
Pr  {Rn  =  i}  =  \/n  for  i  =  1,2 and  hence,  by  Lemma  5.1,  we  have 

E  [Znf  =  l/n  , 


(12.1) 
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for  i  —  1,2 , ,n.  Because  exactly  one  value  of  Znj  is  1  and  all  others  are  0,  we 
also  have 


n 


We  shall  show  that  E  [Yn\  is  polynomial  in  n,  which  will  ultimately  imply  that 
E[Xn]  =  O(lgn). 


We  claim  that  the  indicator  random  variable  Zny  —  l{Rn  =  /}  is  independent 
of  the  values  of  Yt-i  and  F„_;.  Having  chosen  R>,  =  i,  the  left  subtree  (whose 
exponential  height  is  7,_i)  is  randomly  built  on  the  i  —  1  keys  whose  ranks  are 
less  than  i.  This  subtree  is  just  like  any  other  randomly  built  binary  search  tree 
on  i  —  1  keys.  Other  than  the  number  of  keys  it  contains,  this  subtree’s  structure 
is  not  affected  at  all  by  the  choice  of  Rn  =  i,  and  hence  the  random  variables 
Yi-i  and  Zn  i  are  independent.  Likewise,  the  right  subtree,  whose  exponential 
height  is  T„_;,  is  randomly  built  on  the  n  —  i  keys  whose  ranks  are  greater  than  i. 
Its  structure  is  independent  of  the  value  of  Rn,  and  so  the  random  variables 
and  Znj  are  independent.  Hence,  we  have 


n 


E[Y„]  =  E  (2-max(lj_1,T„_;)) 


n 


—  E  [Znj  (2  •  max(7/_i,  Yn-i))]  (by  linearity  of  expectation) 


i  =  1 
n 


=  E  [Z„j]  E  [2  •  max(7,_1 ,  T„_,  )]  (by  independence) 


(by  equation  (C.22)) 


7  =  1 


(by  Exercise  C.3-4)  . 


Since  each  term  E  [F0] ,  E  [TJ  , . . . ,  E  [7„_1]  appears  twice  in  the  last  summation, 
once  as  E  [y,_j ]  and  once  as  E  [7„_,],  we  have  the  recurrence 
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Using  the  substitution  method,  we  shall  show  that  for  all  positive  integers  n,  the 
recurrence  (12.2)  has  the  solution 


E  [Y„]  < 


1  n  +  V 


In  doing  so,  we  shall  use  the  identity 


(12.3) 


(Exercise  12.4-1  asks  you  to  prove  this  identity.) 

For  the  base  cases,  we  note  that  the  bounds  0  =  Y0  =  E  [y0]  <  (1/4)  Q  =  1/4 
and  1  =  Y1  =  E  [Ex]  <  ( 1  /4) ( '/. 3)  =  1  hold.  For  the  inductive  case,  we  have  that 

a  n— 1 


E  [Yn]  <  -^E[E] 

n  *  J 


i  =  0 


< 


(by  the  inductive  hypothesis) 


(by  equation  (12.3)) 


We  have  bounded  E[F„],  but  our  ultimate  goal  is  to  bound  E  [Xn\.  As  Exer¬ 
cise  12.4-4  asks  you  to  show,  the  function  f(x)  =  2X  is  convex  (see  page  1199). 
Therefore,  we  can  employ  Jensen’s  inequality  (C.26),  which  says  that 

2e[x',]  <  E  [2Xn  ] 

=  E  [Yn]  , 
as  follows: 


^[X/i  j 


1  [n  +  3' 
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1  (n  +  3 )(«  +  2)  (77  +  1) 

4  6 

77  3  +  6  77 2  +  1  1 77  +  6 

24  ‘ 

Taking  logarithms  of  both  sides  gives  E  [Xn\  =  0(lg  n).  m 

Exercises 


12.4-1 

Prove  equation  (12.3). 


12.4-2 

Describe  a  binary  search  tree  on  n  nodes  such  that  the  average  depth  of  a  node  in 
the  tree  is  0(lg77)  but  the  height  of  the  tree  is  co(\gn).  Give  an  asymptotic  upper 
bound  on  the  height  of  an  77 -node  binary  search  tree  in  which  the  average  depth  of 
a  node  is  ©(lg7i). 


12.4-3 

Show  that  the  notion  of  a  randomly  chosen  binary  search  tree  on  n  keys,  where 
each  binary  search  tree  of  n  keys  is  equally  likely  to  be  chosen,  is  different  from 
the  notion  of  a  randomly  built  binary  search  tree  given  in  this  section.  (Hint:  List 
the  possibilities  when  n  =  3.) 


12.4- 4 

Show  that  the  function  / ( x )  =  2X  is  convex. 

12.4- 5  * 

Consider  RANDOMlZED-QuiCKSORT  operating  on  a  sequence  of  n  distinct  input 
numbers.  Prove  that  for  any  constant  k  >  0,  all  but  0(  1  / nk )  of  the  77!  input 
permutations  yield  an  0(n  lg  77 )  running  time. 


Problems 


12-1  Binary  search  trees  with  equal  keys 

Equal  keys  pose  a  problem  for  the  implementation  of  binary  search  trees. 

a.  What  is  the  asymptotic  performance  of  Tree-Insert  when  used  to  insert  n 
items  with  identical  keys  into  an  initially  empty  binary  search  tree? 

We  propose  to  improve  Tree-Insert  by  testing  before  line  5  to  determine  whether 
Z.key  =  x.key  and  by  testing  before  line  11  to  determine  whether  ", . key  =  y .key. 
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If  equality  holds,  we  implement  one  of  the  following  strategies.  For  each  strategy, 
find  the  asymptotic  performance  of  inserting  n  items  with  identical  keys  into  an 
initially  empty  binary  search  tree.  (The  strategies  are  described  for  line  5,  in  which 
we  compare  the  keys  of  z  and  x.  Substitute  y  for  x  to  arrive  at  the  strategies  for 
line  11.) 

b.  Keep  a  boolean  flag  x.b  at  node  x,  and  set  x  to  either  x.left  or  x. right  based 
on  the  value  of  x.b,  which  alternates  between  FALSE  and  TRUE  each  time  we 
visit  x  while  inserting  a  node  with  the  same  key  as  x. 

c.  Keep  a  list  of  nodes  with  equal  keys  at  x,  and  insert  z  into  the  list. 

d.  Randomly  set  x  to  either  x.left  or  x. right.  (Give  the  worst-case  performance 
and  informally  derive  the  expected  running  time.) 

12-2  Radix  trees 

Given  two  strings  a  =  a()a ,  . . .  ap  and  b  =  . . .  bq ,  where  each  a,  and  each  bj 

is  in  some  ordered  set  of  characters,  we  say  that  string  a  is  lexicographically  less 
than  string  b  if  either 

1.  there  exists  an  integer  j ,  where  0  <  j  <  min (p,q),  such  that  at  =  bt  for  all 
i  =  0, 1, . . . ,  j  —  1  and  a,  <  bj ,  or 

2.  p  <  q  and  a,  =  b /  for  all  i  =  0, 1 p. 

For  example,  if  a  and  b  are  bit  strings,  then  10100  <  10110  by  rule  1  (letting 
j  =  3)  and  10100  <  101000  by  rule  2.  This  ordering  is  similar  to  that  used  in 
English-language  dictionaries. 

The  radix  tree  data  structure  shown  in  Figure  12.5  stores  the  bit  strings  1011, 
10,  Oil,  100,  and  0.  When  searching  for  a  key  a  =  a0ai  . .  ,ap,  we  go  left  at  a 
node  of  depth  i  if  a,  =  0  and  right  if  a,  =  1.  Let  S  be  a  set  of  distinct  bit  strings 
whose  lengths  sum  to  n.  Show  how  to  use  a  radix  tree  to  sort  S  lexicographically 
in  0(/f)  time.  For  the  example  in  Figure  12.5,  the  output  of  the  sort  should  be  the 
sequence  0,011,  10,  100,  1011. 

12-3  Average  node  depth  in  a  randomly  built  binary  search  tree 
In  this  problem,  we  prove  that  the  average  depth  of  a  node  in  a  randomly  built 
binary  search  tree  with  n  nodes  is  0(\gn).  Although  this  result  is  weaker  than 
that  of  Theorem  12.4,  the  technique  we  shall  use  reveals  a  surprising  similarity 
between  the  building  of  a  binary  search  tree  and  the  execution  of  Randomized- 
Quicksort  from  Section  7.3. 

We  define  the  total  path  length  P(T)  of  a  binary  tree  T  as  the  sum,  over  all 
nodes  x  in  T ,  of  the  depth  of  node  x,  which  we  denote  by  d(x,T). 
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Figure  12.5  A  radix  tree  storing  the  bit  strings  1011,  10,  Oil,  100,  and  0.  We  can  determine  each 
node’s  key  by  traversing  the  simple  path  from  the  root  to  that  node.  There  is  no  need,  therefore,  to 
store  the  keys  in  the  nodes;  the  keys  appear  here  for  illustrative  purposes  only.  Nodes  are  heavily 
shaded  if  the  keys  corresponding  to  them  are  not  in  the  tree;  such  nodes  are  present  only  to  establish 
a  path  to  other  nodes. 

a.  Argue  that  the  average  depth  of  a  node  in  T  is 


Thus,  we  wish  to  show  that  the  expected  value  of  P(T )  is  0(n  lg  n). 

b.  Let  Ti  and  T R  denote  the  left  and  right  subtrees  of  tree  T,  respectively.  Argue 
that  if  T  has  n  nodes,  then 

P(T)  =  P(Tl)  +  P(Tr)  +  n-  1  . 

c.  Let  P(n)  denote  the  average  total  path  length  of  a  randomly  built  binary  search 
tree  with  n  nodes.  Show  that 


n —  1 


1  =  0 


d.  Show  how  to  rewrite  P(n)  as 


e.  Recalling  the  alternative  analysis  of  the  randomized  version  of  quicksort  given 
in  Problem  7-3,  conclude  that  P(n)  =  0(n  lg n). 
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At  each  recursive  invocation  of  quicksort,  we  choose  a  random  pivot  element  to 
partition  the  set  of  elements  being  sorted.  Each  node  of  a  binary  search  tree  parti¬ 
tions  the  set  of  elements  that  fall  into  the  subtree  rooted  at  that  node. 

/.  Describe  an  implementation  of  quicksort  in  which  the  comparisons  to  sort  a  set 
of  elements  are  exactly  the  same  as  the  comparisons  to  insert  the  elements  into 
a  binary  search  tree.  (The  order  in  which  comparisons  are  made  may  differ,  but 
the  same  comparisons  must  occur.) 

12-4  Number  of  different  binary  trees 

Let  bn  denote  the  number  of  different  binary  trees  with  n  nodes.  In  this  problem, 
you  will  find  a  formula  for  b„,  as  well  as  an  asymptotic  estimate. 

a.  Show  that  b0  =  1  and  that,  for  n  >  1 , 


n— 1 


b.  Referring  to  Problem  4-4  for  the  definition  of  a  generating  function,  let  B(x) 
be  the  generating  function 


OO 


B(x)  =  '^bnxn  . 


Show  that  B(x)  =  xB(x)2  +  1,  and  hence  one  way  to  express  B(x)  in  closed 
form  is 


The  Taylor  expansion  of  fix)  around  the  point  x  =  a  is  given  by 


where  f<k)(x)  is  the  Mi  derivative  of  /  evaluated  at  x. 


c.  Show  that 
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(the  /7th  Catalan  number)  by  using  the  Taylor  expansion  of  V 1  —  4x  around 
x  =  0.  (If  you  wish,  instead  of  using  the  Taylor  expansion,  you  may  use 
the  generalization  of  the  binomial  expansion  (C.4)  to  nonintegral  exponents  n, 
where  for  any  real  number  n  and  for  any  integer  k,  we  interpret  ('')  to  be 
n(n  —  1)  ■  ■  ■  {n  —  k  +  l)/k!  if  k  >  0,  and  0  otherwise.) 

d.  Show  that 

K  =  (1  +  0,1/n))  ■ 


Chapter  notes 

Knuth  [211]  contains  a  good  discussion  of  simple  binary  search  trees  as  well  as 
many  variations.  Binary  search  trees  seem  to  have  been  independently  discovered 
by  a  number  of  people  in  the  late  1950s.  Radix  trees  are  often  called  “tries,”  which 
comes  from  the  middle  letters  in  the  word  retrieval.  Knuth  [211]  also  discusses 
them. 

Many  texts,  including  the  first  two  editions  of  this  book,  have  a  somewhat  sim¬ 
pler  method  of  deleting  a  node  from  a  binary  search  tree  when  both  of  its  children 
are  present.  Instead  of  replacing  node  z  by  its  successor  y,  we  delete  node  y  but 
copy  its  key  and  satellite  data  into  node  z.  The  downside  of  this  approach  is  that 
the  node  actually  deleted  might  not  be  the  node  passed  to  the  delete  procedure.  If 
other  components  of  a  program  maintain  pointers  to  nodes  in  the  tree,  they  could 
mistakenly  end  up  with  “stale”  pointers  to  nodes  that  have  been  deleted.  Although 
the  deletion  method  presented  in  this  edition  of  this  book  is  a  bit  more  complicated, 
it  guarantees  that  a  call  to  delete  node  z  deletes  node  z  and  only  node  z. 

Section  15.5  will  show  how  to  construct  an  optimal  binary  search  tree  when 
we  know  the  search  frequencies  before  constructing  the  tree.  That  is,  given  the 
frequencies  of  searching  for  each  key  and  the  frequencies  of  searching  for  values 
that  fall  between  keys  in  the  tree,  we  construct  a  binary  search  tree  for  which  a 
set  of  searches  that  follows  these  frequencies  examines  the  minimum  number  of 
nodes. 

The  proof  in  Section  12.4  that  bounds  the  expected  height  of  a  randomly  built 
binary  search  tree  is  due  to  Aslam  [24].  Martinez  and  Roura  [243]  give  randomized 
algorithms  for  insertion  into  and  deletion  from  binary  search  trees  in  which  the 
result  of  either  operation  is  a  random  binary  search  tree.  Their  definition  of  a 
random  binary  search  tree  differs— only  slightly— from  that  of  a  randomly  built 
binary  search  tree  in  this  chapter,  however. 
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Chapter  12  showed  that  a  binary  search  tree  of  height  h  can  support  any  of  the  basic 
dynamic-set  operations— such  as  Search,  Predecessor,  Successor,  Mini¬ 
mum,  Maximum,  Insert,  and  Delete— in  0(h)  time.  Thus,  the  set  operations 
are  fast  if  the  height  of  the  search  tree  is  small.  If  its  height  is  large,  however,  the 
set  operations  may  run  no  faster  than  with  a  linked  list.  Red-black  trees  are  one 
of  many  search-tree  schemes  that  are  “balanced”  in  order  to  guarantee  that  basic 
dynamic-set  operations  take  0(lg  n)  time  in  the  worst  case. 


13.1  Properties  of  red-black  trees 

A  red-black  tree  is  a  binary  search  tree  with  one  extra  bit  of  storage  per  node:  its 
color,  which  can  be  either  RED  or  BLACK.  By  constraining  the  node  colors  on  any 
simple  path  from  the  root  to  a  leaf,  red-black  trees  ensure  that  no  such  path  is  more 
than  twice  as  long  as  any  other,  so  that  the  tree  is  approximately  balanced. 

Each  node  of  the  tree  now  contains  the  attributes  color,  key,  left,  right,  and  p.  If 
a  child  or  the  parent  of  a  node  does  not  exist,  the  corresponding  pointer  attribute 
of  the  node  contains  the  value  NIL.  We  shall  regard  these  nils  as  being  pointers  to 
leaves  (external  nodes)  of  the  binary  search  tree  and  the  normal,  key-bearing  nodes 
as  being  internal  nodes  of  the  tree. 

A  red-black  tree  is  a  binary  tree  that  satisfies  the  following  red-black  properties'. 

1 .  Every  node  is  either  red  or  black. 

2.  The  root  is  black. 

3.  Every  leaf  (NIL)  is  black. 

4.  If  a  node  is  red,  then  both  its  children  are  black. 

5.  For  each  node,  all  simple  paths  from  the  node  to  descendant  leaves  contain  the 
same  number  of  black  nodes. 
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Figure  13.1(a)  shows  an  example  of  a  red-black  tree. 

As  a  matter  of  convenience  in  dealing  with  boundary  conditions  in  red-black 
tree  code,  we  use  a  single  sentinel  to  represent  NIL  (see  page  238).  For  a  red-black 
tree  T ,  the  sentinel  T.  nil  is  an  object  with  the  same  attributes  as  an  ordinary  node 
in  the  tree.  Its  color  attribute  is  BLACK,  and  its  other  attributes—/),  left,  right, 
and  key— can  take  on  arbitrary  values.  As  Figure  13.1(b)  shows,  all  pointers  to  NIL 
are  replaced  by  pointers  to  the  sentinel  T.  nil. 

We  use  the  sentinel  so  that  we  can  treat  a  NIL  child  of  a  node  x  as  an  ordinary 
node  whose  parent  is  x.  Although  we  instead  could  add  a  distinct  sentinel  node 
for  each  NIL  in  the  tree,  so  that  the  parent  of  each  NIL  is  well  defined,  that  ap¬ 
proach  would  waste  space.  Instead,  we  use  the  one  sentinel  T.nil  to  represent  all 
the  nils— all  leaves  and  the  root’s  parent.  The  values  of  the  attributes  p,  left,  right, 
and  key  of  the  sentinel  are  immaterial,  although  we  may  set  them  during  the  course 
of  a  procedure  for  our  convenience. 

We  generally  coniine  our  interest  to  the  internal  nodes  of  a  red-black  tree,  since 
they  hold  the  key  values.  In  the  remainder  of  this  chapter,  we  omit  the  leaves  when 
we  draw  red-black  trees,  as  shown  in  Figure  13.1(c). 

We  call  the  number  of  black  nodes  on  any  simple  path  from,  but  not  including,  a 
node  x  down  to  a  leaf  the  black-height  of  the  node,  denoted  bh(x).  By  property  5, 
the  notion  of  black-height  is  well  defined,  since  all  descending  simple  paths  from 
the  node  have  the  same  number  of  black  nodes.  We  define  the  black-height  of  a 
red-black  tree  to  be  the  black-height  of  its  root. 

The  following  lemma  shows  why  red-black  trees  make  good  search  trees. 

Lemma  13.1 

A  red-black  tree  with  n  internal  nodes  has  height  at  most  2  lg(/?  +  1). 

Proof  We  start  by  showing  that  the  subtree  rooted  at  any  node  x  contains  at  least 
2bh(x)  —  1  internal  nodes.  We  prove  this  claim  by  induction  on  the  height  of  x.  If 
the  height  of  x  is  0,  then  x  must  be  a  leaf  (T. nil),  and  the  subtree  rooted  at  x  indeed 
contains  at  least  2bh(x)  —  1  =  2°  —  1  =  0  internal  nodes.  For  the  inductive  step, 
consider  a  node  x  that  has  positive  height  and  is  an  internal  node  with  two  children. 
Each  child  has  a  black-height  of  either  bh(x)  or  bh(x)  —  1,  depending  on  whether 
its  color  is  red  or  black,  respectively.  Since  the  height  of  a  child  of  x  is  less  than 
the  height  of  x  itself,  we  can  apply  the  inductive  hypothesis  to  conclude  that  each 
child  has  at  least  2bh(x,~ 1  —  1  internal  nodes.  Thus,  the  subtree  rooted  at  x  contains 
at  least  (2bh(x)_1  —  1)  +  (2bh(x,“'  —  1)  +  1  =  2bh(x)  —  1  internal  nodes,  which  proves 
the  claim. 

To  complete  the  proof  of  the  lemma,  let  h  be  the  height  of  the  tree.  According 
to  property  4,  at  least  half  the  nodes  on  any  simple  path  from  the  root  to  a  leaf,  not 
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Figure  13.1  A  red  black  tree  with  black  nodes  darkened  and  red  nodes  shaded.  Every  node  in  a 
red  black  tree  is  either  red  or  black,  the  children  of  a  red  node  are  both  black,  and  every  simple  path 
from  a  node  to  a  descendant  leaf  contains  the  same  number  of  black  nodes,  (a)  Every  leaf,  shown 
as  a  NIL,  is  black.  Each  non  NIL  node  is  marked  with  its  black  height;  nils  have  black  height  0. 
(b)  The  same  red  black  tree  but  with  each  NIL  replaced  by  the  single  sentinel  T.  nil,  which  is  always 
black,  and  with  black  heights  omitted.  The  root’s  parent  is  also  the  sentinel,  (c)  The  same  red  black 
tree  but  with  leaves  and  the  root’s  parent  omitted  entirely.  We  shall  use  this  drawing  style  in  the 
remainder  of  this  chapter. 
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including  the  root,  must  be  black.  Consequently,  the  black-height  of  the  root  must 
be  at  least  h/ 2;  thus, 

n  >  2h/1  -  1  . 


Moving  the  1  to  the  left-hand  side  and  taking  logarithms  on  both  sides  yields 
lg(«  +  1)  >  h/2,  or  h  <  21g (n  +  1).  ■ 

As  an  immediate  consequence  of  this  lemma,  we  can  implement  the  dynamic-set 
operations  Search,  Minimum,  Maximum,  Successor,  and  Predecessor 
in  0(lgn)  time  on  red-black  trees,  since  each  can  run  in  0(h)  time  on  a  binary 
search  tree  of  height  h  (as  shown  in  Chapter  12)  and  any  red-black  tree  on  n  nodes 
is  a  binary  search  tree  with  height  0(lg n).  (Of  course,  references  to  NIL  in  the 
algorithms  of  Chapter  12  would  have  to  be  replaced  by  T.nil.)  Although  the  al¬ 
gorithms  Tree-Insert  and  Tree-Delete  from  Chapter  12  run  in  0(lg  n)  time 
when  given  a  red-black  tree  as  input,  they  do  not  directly  support  the  dynamic-set 
operations  Insert  and  Delete,  since  they  do  not  guarantee  that  the  modified  bi¬ 
nary  search  tree  will  be  a  red-black  tree.  We  shall  see  in  Sections  13.3  and  13.4, 
however,  how  to  support  these  two  operations  in  0(lg  n)  time. 

Exercises 


13.1-1 

In  the  style  of  Figure  13.1(a),  draw  the  complete  binary  search  tree  of  height  3  on 
the  keys  {1,2,...,  15}.  Add  the  NIL  leaves  and  color  the  nodes  in  three  different 
ways  such  that  the  black-heights  of  the  resulting  red-black  trees  are  2,  3,  and  4. 


13.1-2 

Draw  the  red-black  tree  that  results  after  Tree-Insert  is  called  on  the  tree  in 
Figure  13.1  with  key  36.  If  the  inserted  node  is  colored  red,  is  the  resulting  tree  a 
red-black  tree?  What  if  it  is  colored  black? 


13.1-3 

Let  us  define  a  relaxed  red-black  tree  as  a  binary  search  tree  that  satisfies  red- 
black  properties  1,  3,  4,  and  5.  In  other  words,  the  root  may  be  either  red  or  black. 
Consider  a  relaxed  red-black  tree  T  whose  root  is  red.  If  we  color  the  root  of  T 
black  but  make  no  other  changes  to  T ,  is  the  resulting  tree  a  red-black  tree? 


13.1-4 

Suppose  that  we  “absorb”  every  red  node  in  a  red-black  tree  into  its  black  parent, 
so  that  the  children  of  the  red  node  become  children  of  the  black  parent.  (Ignore 
what  happens  to  the  keys.)  What  are  the  possible  degrees  of  a  black  node  after  all 
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its  red  children  are  absorbed?  What  can  you  say  about  the  depths  of  the  leaves  of 
the  resulting  tree? 


13.1-5 

Show  that  the  longest  simple  path  from  a  node  x  in  a  red-black  tree  to  a  descendant 
leaf  has  length  at  most  twice  that  of  the  shortest  simple  path  from  node  x  to  a 
descendant  leaf. 


13.1-6 

What  is  the  largest  possible  number  of  internal  nodes  in  a  red-black  tree  with  black- 
height  kl  What  is  the  smallest  possible  number? 


13.1-7 

Describe  a  red-black  tree  on  n  keys  that  realizes  the  largest  possible  ratio  of  red  in¬ 
ternal  nodes  to  black  internal  nodes.  What  is  this  ratio?  What  tree  has  the  smallest 
possible  ratio,  and  what  is  the  ratio? 


13.2  Rotations 

The  search-tree  operations  Tree-Insert  and  Tree-Delete,  when  run  on  a  red- 
black  tree  with  n  keys,  take  0(lgn)  time.  Because  they  modify  the  tree,  the  result 
may  violate  the  red-black  properties  enumerated  in  Section  13.1.  To  restore  these 
properties,  we  must  change  the  colors  of  some  of  the  nodes  in  the  tree  and  also 
change  the  pointer  structure. 

We  change  the  pointer  structure  through  rotation,  which  is  a  local  operation  in 
a  search  tree  that  preserves  the  binary-search-tree  property.  Figure  13.2  shows  the 
two  kinds  of  rotations:  left  rotations  and  right  rotations.  When  we  do  a  left  rotation 
on  a  node  x,  we  assume  that  its  right  child  y  is  not  T.nil;  x  may  be  any  node  in 
the  tree  whose  right  child  is  not  T.nil.  The  left  rotation  “pivots”  around  the  link 
from  x  to  y.  It  makes  y  the  new  root  of  the  subtree,  with  x  as  y’s  left  child  and  y  ’s 
left  child  as  x ’s  right  child. 

The  pseudocode  for  Left- Rotate  assumes  that  x. right  ^  T.nil  and  that  the 
root’s  parent  is  T.nil. 
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Left  Rotate(7’,  jc) 

•iii . 


. in- 

Right  Rotate(7',  y) 


Figure  13 .2  The  rotation  operations  on  a  binary  search  tree.  The  operation  LEFT  ROTATE(7',  x) 
transforms  the  configuration  of  the  two  nodes  on  the  right  into  the  configuration  on  the  left  by  chang 
ing  a  constant  number  of  pointers.  The  inverse  operation  RIGHT  ROTATE(r,  y)  transforms  the  con 
figuration  on  the  left  into  the  configuration  on  the  right.  The  letters  a,  fl,  and  y  represent  arbitrary 
subtrees.  A  rotation  operation  preserves  the  binary  search  tree  property:  the  keys  in  or  precede  x .  key, 
which  precedes  the  keys  in  fl,  which  precede  y.key,  which  precedes  the  keys  in  y. 

Left-Rotate  (T,  a) 

1  y  =  x.  right 

2  x.  right  =  y.left 

3  if  y.left  ±  T.nil 

4  y.left.p  =  x 

5  y.p  =  x.p 

6  if  x.p —T.nil 

7  T.root  =  y 

8  elseif  x  —  x.p.  left 

9  x.p. left  =  y 

10  else  x.p. right  =  y 

11  y.left  =  x 

12  x.p  =  y 

Figure  13.3  shows  an  example  of  how  Left-Rotate  modifies  a  binary  search 
tree.  The  code  for  Right-Rotate  is  symmetric.  Both  Left-Rotate  and  Right- 
Rotate  run  in  0(1)  time.  Only  pointers  are  changed  by  a  rotation;  all  other 
attributes  in  a  node  remain  the  same. 

Exercises 

132-1 

Write  pseudocode  for  Right-Rotate. 

132-2 

Argue  that  in  every  w-node  binary  search  tree,  there  are  exactly  n  —  1  possible 
rotations. 


//  set  y 

II  turn  y’s  left  subtree  into  x’s  right  subtree 


//  link  x ’s  parent  to  y 


II  put  x  on  y’s  left 
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Figure  13 3  An  example  of  how  the  procedure  Left  Rotate(7\  x  )  modifies  a  binary  search  tree. 
Inorder  tree  walks  of  the  input  tree  and  the  modified  tree  produce  the  same  listing  of  key  values. 


132-3 

Let  a,  b,  and  c  be  arbitrary  nodes  in  subtrees  a,  and  y,  respectively,  in  the  left 
tree  of  Figure  13.2.  How  do  the  depths  of  a,  b,  and  c  change  when  a  left  rotation 
is  performed  on  node  x  in  the  figure? 

132-4 

Show  that  any  arbitrary  n-node  binary  search  tree  can  be  transformed  into  any  other 
arbitrary  a -node  binary  search  tree  using  O(n)  rotations.  (Hint:  First  show  that  at 
most  n  —  1  right  rotations  suffice  to  transform  the  tree  into  a  right-going  chain.) 

132-5  ★ 

We  say  that  a  binary  search  tree  Tx  can  be  right-converted  to  binary  search  tree  T2 
if  it  is  possible  to  obtain  T2  from  Tx  via  a  series  of  calls  to  Right- Rotate.  Give 
an  example  of  two  trees  7\  and  T2  such  that  T\  cannot  be  right-converted  to  T2. 
Then,  show  that  if  a  tree  7\  can  be  right-converted  to  T2,  it  can  be  right-converted 
using  0(n2)  calls  to  RlGHT-ROTATE. 
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We  can  insert  a  node  into  an  n-node  red-black  tree  in  0(\g  n)  time.  To  do  so,  we 
use  a  slightly  modified  version  of  the  Tree-Insert  procedure  (Section  12.3)  to 
insert  node  z  into  the  tree  T  as  if  it  were  an  ordinary  binary  search  tree,  and  then  we 
color  z  red.  (Exercise  13.3-1  asks  you  to  explain  why  we  choose  to  make  node  z 
red  rather  than  black.)  To  guarantee  that  the  red-black  properties  are  preserved,  we 
then  call  an  auxiliary  procedure  RB -Insert-Fixup  to  recolor  nodes  and  perform 
rotations.  The  call  RB -INSERT (T.  z)  inserts  node  z,  whose  key  is  assumed  to  have 
already  been  hlled  in,  into  the  red-black  tree  T. 

RB-Insert  (T,z) 

1  y  =  T.nil 

2  x  =  T.root 

3  while  x  7^  T.nil 

4  y  =  x 

5  if  z .  key  <  x .  key 

6  x  =  x.left 

7  else  x  =  x. right 
S  z.p  =  y 

9  if  y --T.nil 

10  T.root  =  z 

1 1  elseif  z  ■  key  <  y .  key 

12  y.left  =  z 

13  else  y.  right  =  z 

14  z.left  =  T.nil 

15  z  .right  =  T.nil 

16  z.  color  =  RED 

17  RB -Insert-Fixup  (T,  z) 

The  procedures  Tree-Insert  and  RB-Insert  differ  in  four  ways.  First,  ah 
instances  of  nil  in  Tree-Insert  are  replaced  by  T.nil.  Second,  we  set  z.left 
and  z. right  to  T.nil  in  lines  14-15  of  RB-Insert,  in  order  to  maintain  the 
proper  tree  structure.  Third,  we  color  z  red  in  line  16.  Fourth,  because  col¬ 
oring  z  red  may  cause  a  violation  of  one  of  the  red-black  properties,  we  call 
RB -Insert-Fixup (T,  z)  in  line  17  of  RB-Insert  to  restore  the  red-black  prop¬ 
erties. 


316 


Chapter  13  Red  Black  Trees 


RB  -Insert-Fixup  ( T,  z) 


1 

while  Z-P- color  ==  RED 

2 

if  z.p  ==  z.p.p.left 

3 

y  =  z.p -p  .right 

4 

if  y  .color  ==  RED 

5 

Z.p.  color  —  BLACK 

//  case  1 

6 

y.  color  =  BLACK 

//  case  1 

7 

Z.p  .p  .color  =  RED 

//  case  1 

8 

Z  =  Z.p.p 

//  case  1 

9 

else  if  z  ==  z.p. right 

10 

Z  =  z.p 

//  case  2 

11 

Left- Rotate  (T,  z) 

//  case  2 

12 

z.p.  color  =  BLACK 

//  case  3 

13 

Z.p.p  .color  =  RED 

//  case  3 

14 

Right-  Rotate  ( T,  z.p.p) 

//  case  3 

15  else  (same  as  then  clause 

with  “right”  and  “left”  exchanged) 

16  T.  root,  color  =  BLACK 

To  understand  how  RB-Insert-Fixup  works,  we  shall  break  our  examination 
of  the  code  into  three  major  steps.  First,  we  shall  determine  what  violations  of 
the  red-black  properties  are  introduced  in  RB -INSERT  when  node  z  is  inserted 
and  colored  red.  Second,  we  shall  examine  the  overall  goal  of  the  while  loop  in 
lines  1-15.  Finally,  we  shall  explore  each  of  the  three  cases1  within  the  while 
loop’s  body  and  see  how  they  accomplish  the  goal.  Figure  13.4  shows  how  RB- 
Insert-Fixup  operates  on  a  sample  red-black  tree. 

Which  of  the  red-black  properties  might  be  violated  upon  the  call  to  RB- 
Insert-Fixup?  Property  1  certainly  continues  to  hold,  as  does  property  3,  since 
both  children  of  the  newly  inserted  red  node  are  the  sentinel  T.nil.  Property  5, 
which  says  that  the  number  of  black  nodes  is  the  same  on  every  simple  path  from 
a  given  node,  is  satisfied  as  well,  because  node  z  replaces  the  (black)  sentinel,  and 
node  z  is  red  with  sentinel  children.  Thus,  the  only  properties  that  might  be  vi¬ 
olated  are  property  2,  which  requires  the  root  to  be  black,  and  property  4,  which 
says  that  a  red  node  cannot  have  a  red  child.  Both  possible  violations  are  due  to  z 
being  colored  red.  Property  2  is  violated  if  z  is  the  root,  and  property  4  is  violated 
if  c’s  parent  is  red.  Figure  13.4(a)  shows  a  violation  of  property  4  after  the  node  z 
has  been  inserted. 


^ase  2  falls  through  into  case  3,  and  so  these  two  cases  are  not  mutually  exclusive. 
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Figure  13.4  The  operation  of  RB  INSERT  FIXUP,  (a)  A  node  z  after  insertion.  Because  both  z 
and  its  parent  z.p  are  red,  a  violation  of  property  4  occurs.  Since  z’s  uncle  y  is  red,  case  1  in  the 
code  applies.  We  recolor  nodes  and  move  the  pointer  z  up  the  tree,  resulting  in  the  tree  shown  in  (b). 
Once  again,  z  and  its  parent  are  both  red,  but  j’s  uncle  y  is  black.  Since  z  is  the  right  child  of  z.p, 
case  2  applies.  We  perform  a  left  rotation,  and  the  tree  that  results  is  shown  in  (c).  Now,  z  is  the  left 
child  of  its  parent,  and  case  3  applies.  Recoloring  and  right  rotation  yield  the  tree  in  (d),  which  is  a 
legal  red  black  tree. 
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The  while  loop  in  lines  1-15  maintains  the  following  three-part  invariant  at  the 
stall  of  each  iteration  of  the  loop: 

a.  Node  z  is  red. 

b.  If  2. p  is  the  root,  then  z.p  is  black. 

c.  If  the  tree  violates  any  of  the  red-black  properties,  then  it  violates  at  most 
one  of  them,  and  the  violation  is  of  either  property  2  or  property  4.  If  the 
tree  violates  property  2,  it  is  because  z  is  the  root  and  is  red.  If  the  tree 
violates  property  4,  it  is  because  both  z  and  z.p  are  red. 

Part  (c),  which  deals  with  violations  of  red-black  properties,  is  more  central  to 
showing  that  RB -Insert-Fixup  restores  the  red-black  properties  than  parts  (a) 
and  (b),  which  we  use  along  the  way  to  understand  situations  in  the  code.  Because 
we’ll  be  focusing  on  node  z  and  nodes  near  it  in  the  tree,  it  helps  to  know  from 
part  (a)  that  z  is  red.  We  shall  use  part  (b)  to  show  that  the  node  z.p.p  exists  when 
we  reference  it  in  lines  2,  3,  7,  8,  13,  and  14. 

Recall  that  we  need  to  show  that  a  loop  invariant  is  true  prior  to  the  first  itera¬ 
tion  of  the  loop,  that  each  iteration  maintains  the  loop  invariant,  and  that  the  loop 
invariant  gives  us  a  useful  property  at  loop  termination. 

We  start  with  the  initialization  and  termination  arguments.  Then,  as  we  exam¬ 
ine  how  the  body  of  the  loop  works  in  more  detail,  we  shall  argue  that  the  loop 
maintains  the  invariant  upon  each  iteration.  Along  the  way,  we  shall  also  demon¬ 
strate  that  each  iteration  of  the  loop  has  two  possible  outcomes:  either  the  pointer  z 
moves  up  the  tree,  or  we  perform  some  rotations  and  then  the  loop  terminates. 

Initialization:  Prior  to  the  first  iteration  of  the  loop,  we  started  with  a  red-black 

tree  with  no  violations,  and  we  added  a  red  node  z.  We  show  that  each  part  of 

the  invariant  holds  at  the  time  RB -Insert-Fixup  is  called: 

a.  When  RB -Insert-Fixup  is  called,  z  is  the  red  node  that  was  added. 

b.  If  z -P  is  the  root,  then  z  -P  stalled  out  black  and  did  not  change  prior  to  the 
call  of  RB -Insert-Fixup. 

c.  We  have  already  seen  that  properties  1,  3,  and  5  hold  when  RB-Insert- 
Fixup  is  called. 

If  the  tree  violates  property  2,  then  the  red  root  must  be  the  newly  added 
node  z,  which  is  the  only  internal  node  in  the  tree.  Because  the  parent  and 
both  children  of  z  are  the  sentinel,  which  is  black,  the  tree  does  not  also 
violate  property  4.  Thus,  this  violation  of  property  2  is  the  only  violation  of 
red-black  properties  in  the  entire  tree. 

If  the  tree  violates  property  4,  then,  because  the  children  of  node  z  are  black 
sentinels  and  the  tree  had  no  other  violations  prior  to  z  being  added,  the 
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violation  must  be  because  both  z  and  z.p  are  red.  Moreover,  the  tree  violates 
no  other  red-black  properties. 

Termination:  When  the  loop  terminates,  it  does  so  because  z.p  is  black.  (If  z  is 
the  root,  then  z.p  is  the  sentinel  T.nil,  which  is  black.)  Thus,  the  tree  does  not 
violate  property  4  at  loop  termination.  By  the  loop  invariant,  the  only  property 
that  might  fail  to  hold  is  property  2.  Line  16  restores  this  property,  too,  so  that 
when  RB -Insert-Fixup  terminates,  all  the  red-black  properties  hold. 

Maintenance:  We  actually  need  to  consider  six  cases  in  the  while  loop,  but  three 
of  them  are  symmetric  to  the  other  three,  depending  on  whether  line  2  deter¬ 
mines  z’s  parent  z.p  to  be  a  left  child  or  a  right  child  of  z’s  grandparent  z.p.p. 
We  have  given  the  code  only  for  the  situation  in  which  z.p  is  a  left  child.  The 
node  z.p.p  exists,  since  by  part  (b)  of  the  loop  invariant,  if  z.p  is  the  root, 
then  z.p  is  black.  Since  we  enter  a  loop  iteration  only  if  z.p  is  red,  we  know 
that  z.p  cannot  be  the  root.  Hence,  z.p.p  exists. 

We  distinguish  case  1  from  cases  2  and  3  by  the  color  of  z’s  parent’s  sibling, 
or  “uncle.”  Line  3  makes  y  point  to  z’s  uncle  z.p.p  .right,  and  line  4  tests  y’s 
color.  If  y  is  red,  then  we  execute  case  1.  Otherwise,  control  passes  to  cases  2 
and  3.  In  all  three  cases,  z’s  grandparent  z.p.p  is  black,  since  its  parent  z.p  is 
red,  and  property  4  is  violated  only  between  z  and  z.p. 


Case  1:  z’s  uncle  y  is  red 

Figure  13.5  shows  the  situation  for  case  1  (lines  5-8),  which  occurs  when 
both  z.p  and  y  are  red.  Because  z.p.p  is  black,  we  can  color  both  z.p  and  y 
black,  thereby  fixing  the  problem  of  z  and  z.p  both  being  red,  and  we  can 
color  z.p.p  red,  thereby  maintaining  property  5.  We  then  repeat  the  while  loop 
with  z.p.p  as  the  new  node  z.  The  pointer  z  moves  up  two  levels  in  the  tree. 

Now,  we  show  that  case  1  maintains  the  loop  invariant  at  the  start  of  the  next 
iteration.  We  use  z  to  denote  node  z  in  the  current  iteration,  and  z!  =  Z.p.p 
to  denote  the  node  that  will  be  called  node  z  at  the  test  in  line  1  upon  the  next 
iteration. 

a.  Because  this  iteration  colors  z.p.p  red,  node  z!  is  red  at  the  start  of  the  next 
iteration. 

b.  The  node  z'.p  is  z.p.p.p  in  this  iteration,  and  the  color  of  this  node  does  not 
change.  If  this  node  is  the  root,  it  was  black  prior  to  this  iteration,  and  it 
remains  black  at  the  staid  of  the  next  iteration. 

c.  We  have  already  argued  that  case  1  maintains  property  5,  and  it  does  not 
introduce  a  violation  of  properties  1  or  3. 
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Figure  13.5  Case  1  of  the  procedure  RB  INSERT  FIXUP.  Property  4  is  violated,  since  z  and  its 
parent  z.p  are  both  red.  We  take  the  same  action  whether  (a)  z  is  a  right  child  or  (b)  z  is  a  left 
child.  Each  of  the  subtrees  a,  fl,  y,  8,  and  e  has  a  black  toot,  and  each  has  the  same  black  height. 
The  code  for  case  1  changes  the  colors  of  some  nodes,  preserving  property  5:  all  downward  simple 
paths  from  a  node  to  a  leaf  have  the  same  number  of  blacks.  The  while  loop  continues  with  node  z’s 
grandparent  z  p  p  as  the  new  z..  Any  violation  of  property  4  can  now  occur  only  between  the  new  z, 
which  is  red,  and  its  parent,  if  it  is  red  as  well. 


If  node  z'  is  the  root  at  the  start  of  the  next  iteration,  then  case  I  corrected 
the  lone  violation  of  property  4  in  this  iteration.  Since  z!  is  red  and  it  is  the 
root,  property  2  becomes  the  only  one  that  is  violated,  and  this  violation  is 
due  to  z! . 

If  node  z'  is  not  the  root  at  the  start  of  the  next  iteration,  then  case  1  has 
not  created  a  violation  of  property  2.  Case  1  corrected  the  lone  violation 
of  property  4  that  existed  at  the  start  of  this  iteration.  It  then  made  z'  red 
and  left  z'-p  alone.  If  z'-p  was  black,  there  is  no  violation  of  property  4. 
If  z'-p  was  red,  coloring  z!  red  created  one  violation  of  property  4  between  z! 
and  z'-p. 


Case  2:  z’s  uncle  y  is  black  and  z  is  a  right  child 
Case  3:  z’s  uncle  y  is  black  and  z  is  a  left  child 

In  cases  2  and  3,  the  color  of  z’s  uncle  y  is  black.  We  distinguish  the  two  cases 
according  to  whether  z  is  a  right  or  left  child  of  z-p-  Lines  10-11  constitute 
case  2,  which  is  shown  in  Figure  13.6  together  with  case  3.  In  case  2,  node  z 
is  a  right  child  of  its  parent.  We  immediately  use  a  left  rotation  to  transform 
the  situation  into  case  3  (lines  12-14),  in  which  node  z  is  a  left  child.  Because 
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Figure  13.6  Cases  2  and  3  of  the  procedure  RB  INSERT  FIXUP.  As  in  case  1 ,  property  4  is  violated 
in  either  case  2  or  case  3  because  z  and  its  parent  z-p  are  both  red.  Each  of  the  subtrees  a,  f),  y,  and  8 
has  a  black  root  (a,  f),  and  y  from  property  4,  and  8  because  otherwise  we  would  be  in  case  1),  and 
each  has  the  same  black  height.  We  transform  case  2  into  case  3  by  a  left  rotation,  which  preserves 
property  5:  all  downward  simple  paths  from  a  node  to  a  leaf  have  the  same  number  of  blacks.  Case  3 
causes  some  color  changes  and  a  right  rotation,  which  also  preserve  property  5.  The  while  loop  then 
terminates,  because  property  4  is  satisfied:  there  are  no  longer  two  red  nodes  in  a  row. 

both  z  and  z-p  are  red,  the  rotation  affects  neither  the  black-height  of  nodes 
nor  property  5.  Whether  we  enter  case  3  directly  or  through  case  2,  z’s  uncle  y 
is  black,  since  otherwise  we  would  have  executed  case  1.  Additionally,  the 
node  z-p.p  exists,  since  we  have  argued  that  this  node  existed  at  the  time  that 
lines  2  and  3  were  executed,  and  after  moving  z  up  one  level  in  line  10  and  then 
down  one  level  in  line  1 1,  the  identity  of  Z-p-p  remains  unchanged.  In  case  3, 
we  execute  some  color  changes  and  a  right  rotation,  which  preserve  property  5, 
and  then,  since  we  no  longer  have  two  red  nodes  in  a  row,  we  are  done.  The 
while  loop  does  not  iterate  another  time,  since  zp  is  now  black. 

We  now  show  that  cases  2  and  3  maintain  the  loop  invariant.  (As  we  have  just 
argued,  z  p  will  be  black  upon  the  next  test  in  line  1,  and  the  loop  body  will  not 
execute  again.) 

a.  Case  2  makes  z  point  to  z-p,  which  is  red.  No  further  change  to  z  or  its  color 
occurs  in  cases  2  and  3. 

b.  Case  3  makes  z-p  black,  so  that  if  z-p  is  the  root  at  the  start  of  the  next 
iteration,  it  is  black. 

c.  As  in  case  1,  properties  1,  3,  and  5  are  maintained  in  cases  2  and  3. 

Since  node  z  is  not  the  root  in  cases  2  and  3,  we  know  that  there  is  no  viola¬ 
tion  of  property  2.  Cases  2  and  3  do  not  introduce  a  violation  of  property  2, 
since  the  only  node  that  is  made  red  becomes  a  child  of  a  black  node  by  the 
rotation  in  case  3. 

Cases  2  and  3  correct  the  lone  violation  of  property  4,  and  they  do  not  intro¬ 
duce  another  violation. 
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Having  shown  that  each  iteration  of  the  loop  maintains  the  invariant,  we  have 
shown  that  RB -Insert-Fixup  correctly  restores  the  red-black  properties. 

Analysis 

What  is  the  running  time  of  RB -INSERT?  Since  the  height  of  a  red-black  tree  on  n 
nodes  is  O(lgn),  lines  1-16  of  RB-Insert  take  O(lgn)  time.  In  RB-Insert- 
Fixup,  the  while  loop  repeats  only  if  case  1  occurs,  and  then  the  pointer  z  moves 
two  levels  up  the  tree.  The  total  number  of  times  the  while  loop  can  be  executed 
is  therefore  0(lg  n).  Thus,  RB-Insert  takes  a  total  of  0(lg  n)  time.  Moreover,  it 
never  performs  more  than  two  rotations,  since  the  while  loop  terminates  if  case  2 
or  case  3  is  executed. 

Exercises 


13.3-1 

In  line  16  of  RB-Insert,  we  set  the  color  of  the  newly  inserted  node  z  to  red. 
Observe  that  if  we  had  chosen  to  set  z ’s  color  to  black,  then  property  4  of  a  red- 
black  tree  would  not  be  violated.  Why  didn’t  we  choose  to  set  z ’s  color  to  black? 


13.3-2 

Show  the  red-black  trees  that  result  after  successively  inserting  the  keys  41, 38,  31, 
12,  19,  8  into  an  initially  empty  red-black  tree. 


13.3- 3 

Suppose  that  the  black-height  of  each  of  the  subtrees  a ,  /3,  y,  8,  e  in  Figures  13.5 
and  13.6  is  k.  Label  each  node  in  each  figure  with  its  black-height  to  verify  that 
the  indicated  transformation  preserves  property  5. 

13.3- 4 

Professor  Teach  is  concerned  that  RB -Insert-Fixup  might  set  T.  nil.  color  to 
RED,  in  which  case  the  test  in  line  1  would  not  cause  the  loop  to  terminate  when  z 
is  the  root.  Show  that  the  professor’s  concern  is  unfounded  by  arguing  that  RB- 
Insert-Fixup  never  sets  T. nil. color  to  RED. 


13.3- 5 

Consider  a  red-black  tree  formed  by  inserting  n  nodes  with  RB-Insert.  Argue 
that  if  n  >  1,  the  tree  has  at  least  one  red  node. 

13.3- 6 

Suggest  how  to  implement  RB-Insert  efficiently  if  the  representation  for  red- 
black  trees  includes  no  storage  for  parent  pointers. 
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13.4  Deletion 

Like  the  other  basic  operations  on  an  //-node  red-black  tree,  deletion  of  a  node  takes 
time  O fig  n).  Deleting  a  node  from  a  red-black  tree  is  a  bit  more  complicated  than 
inserting  a  node. 

The  procedure  for  deleting  a  node  from  a  red-black  tree  is  based  on  the  Tree- 
Delete  procedure  (Section  12.3).  First,  we  need  to  customize  the  Transplant 
subroutine  that  Tree-Delete  calls  so  that  it  applies  to  a  red-black  tree: 

RB -Transplant  (T,  u,v) 

1  iiu.p~T.nil 

2  T.root  =  v 

3  elseif  u  ==  u.p.left 

4  u.p.left  =  v 

5  else  u.p. right  =  v 

6  v.p  =  u.p 

The  procedure  RB -Transplant  differs  from  Transplant  in  two  ways.  First, 
line  1  references  the  sentinel  T.  nil  instead  of  NIL.  Second,  the  assignment  to  v.p  in 
line  6  occurs  unconditionally:  we  can  assign  to  v.p  even  if  v  points  to  the  sentinel. 
In  fact,  we  shall  exploit  the  ability  to  assign  to  v.p  when  v  =  T. nil. 

The  procedure  RB -Delete  is  like  the  Tree-Delete  procedure,  but  with  ad¬ 
ditional  lines  of  pseudocode.  Some  of  the  additional  lines  keep  track  of  a  node  y 
that  might  cause  violations  of  the  red-black  properties.  When  we  want  to  delete 
node  z  and  z  has  fewer  than  two  children,  then  z  is  removed  from  the  tree,  and  we 
want  y  to  be  z.  When  z  has  two  children,  then  y  should  be  z’s  successor,  and  y 
moves  into  z’s  position  in  the  tree.  We  also  remember  y’ s  color  before  it  is  re¬ 
moved  from  or  moved  within  the  tree,  and  we  keep  track  of  the  node  x  that  moves 
into  y’s  original  position  in  the  tree,  because  node  x  might  also  cause  violations 
of  the  red-black  properties.  After  deleting  node  z,  RB -Delete  calls  an  auxiliary 
procedure  RB-Delete-Fixup,  which  changes  colors  and  performs  rotations  to 
restore  the  red-black  properties. 
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RB -Delete  (7j  z) 

1  y  =  z 

2  y-original-color  =  y.  color 

3  if  z.left  ==  T.nil 

4  x  =  z.  right 

5  RB -Transplant  (r,  z,z.  right) 

6  elseif  z. right  ==  T.nil 

7  x  =  z.left 

8  RB -Transplant  (T,  z,  z.left) 

9  else  y  =  Tree-Minimum  (z  .  right) 

10  y-original-color  =  y.  color 

11  x  =  y.  right 

12  if  y.p==z 

13  x.p  =  y 

14  else  RB -Transplant (T,  y,  y .right) 

15  y.  right  =  z.  right 

16  y.  right,  p  =  y 

17  RB -Transplant  (T,  z,y) 

18  y.left  =  z.left 

19  y.left.p  =  y 

20  y.  color  =  z.  color 

21  if  y-original-color  ==  BLACK 

22  RB  -Delete-Fixup  (7j  x) 

Although  RB -Delete  contains  almost  twice  as  many  lines  of  pseudocode  as 
Tree-Delete,  the  two  procedures  have  the  same  basic  structure.  You  can  find 
each  line  of  Tree-Delete  within  RB-Delete  (with  the  changes  of  replacing 
nil  by  T.nil  and  replacing  calls  to  Transplant  by  calls  to  RB -Transplant), 
executed  under  the  same  conditions. 

Here  are  the  other  differences  between  the  two  procedures: 

*  We  maintain  node  y  as  the  node  either  removed  from  the  tree  or  moved  within 
the  tree.  Line  1  sets  y  to  point  to  node  z  when  z  has  fewer  than  two  children 
and  is  therefore  removed.  When  z  has  two  children,  line  9  sets  y  to  point  to  z’s 
successor,  just  as  in  Tree-Delete,  and  y  will  move  into  z’s  position  in  the 
tree. 

*  Because  node  y’s  color  might  change,  the  variable  y-original-color  stores  y’s 
color  before  any  changes  occur.  Lines  2  and  10  set  this  variable  immediately 
after  assignments  to  y.  When  z  has  two  children,  then  y  f  z  and  node  y 
moves  into  node  z’s  original  position  in  the  red-black  tree;  line  20  gives  y  the 
same  color  as  z.  We  need  to  save  y’s  original  color  in  order  to  test  it  at  the 
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end  of  RB -Delete;  if  it  was  black,  then  removing  or  moving  y  could  cause 
violations  of  the  red-black  properties. 

•  As  discussed,  we  keep  track  of  the  node  x  that  moves  into  node  y’s  original 
position.  The  assignments  in  lines  4,  7,  and  11  set  x  to  point  to  either  y ’s  only 
child  or,  if  y  has  no  children,  the  sentinel  T.nil.  (Recall  from  Section  12.3 
that  y  has  no  left  child.) 

•  Since  node  x  moves  into  node  y’s  original  position,  the  attribute  x.p  is  always 
set  to  point  to  the  original  position  in  the  tree  of  y’s  parent,  even  if  x  is,  in  fact, 
the  sentinel  T.  nil.  Unless  z  is  y ’s  original  parent  (which  occurs  only  when  z  has 
two  children  and  its  successor  y  is  z' s  right  child),  the  assignment  to  x.p  takes 
place  in  line  6  of  RB -Transplant.  (Observe  that  when  RB -Transplant 
is  called  in  lines  5,  8,  or  14,  the  second  parameter  passed  is  the  same  as  x.) 

When  y’s  original  parent  is  z,  however,  we  do  not  want  x.p  to  point  to  y’s  orig¬ 
inal  parent,  since  we  are  removing  that  node  from  the  tree.  Because  node  y  will 
move  up  to  take  z' s  position  in  the  tree,  setting  x.p  to  y  in  line  13  causes  x.p 
to  point  to  the  original  position  of  y’s  parent,  even  if  x  =  T.nil. 

•  Finally,  if  node  y  was  black,  we  might  have  introduced  one  or  more  violations 
of  the  red-black  properties,  and  so  we  call  RB-Delete-Fixup  in  line  22  to 
restore  the  red-black  properties.  If  y  was  red,  the  red-black  properties  still  hold 
when  y  is  removed  or  moved,  for  the  following  reasons: 

1 .  No  black-heights  in  the  tree  have  changed. 

2.  No  red  nodes  have  been  made  adjacent.  Because  y  takes  z’s  place  in  the 
tree,  along  with  z’s  color,  we  cannot  have  two  adjacent  red  nodes  at  y’s  new 
position  in  the  tree.  In  addition,  if  y  was  not  z’s  right  child,  then  y’s  original 
right  child  x  replaces  y  in  the  tree.  If  y  is  red,  then  x  must  be  black,  and  so 
replacing  y  by  x  cannot  cause  two  red  nodes  to  become  adjacent. 

3.  Since  y  could  not  have  been  the  root  if  it  was  red,  the  root  remains  black. 

If  node  y  was  black,  three  problems  may  arise,  which  the  call  of  RB -Delete- 
Fixup  will  remedy.  First,  if  y  had  been  the  root  and  a  red  child  of  y  becomes  the 
new  root,  we  have  violated  property  2.  Second,  if  both  x  and  x.p  are  red,  then 
we  have  violated  property  4.  Third,  moving  y  within  the  tree  causes  any  simple 
path  that  previously  contained  y  to  have  one  fewer  black  node.  Thus,  property  5 
is  now  violated  by  any  ancestor  of  y  in  the  tree.  We  can  correct  the  violation 
of  property  5  by  saying  that  node  x,  now  occupying  y’s  original  position,  has  an 
“extra”  black.  That  is,  if  we  add  1  to  the  count  of  black  nodes  on  any  simple  path 
that  contains  x,  then  under  this  interpretation,  property  5  holds.  When  we  remove 
or  move  the  black  node  y,  we  “push”  its  blackness  onto  node  x.  The  problem  is 
that  now  node  x  is  neither  red  nor  black,  thereby  violating  property  1.  Instead, 
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node  x  is  either  “doubly  black”  or  “red-and-black,”  and  it  contributes  either  2  or  1 , 
respectively,  to  the  count  of  black  nodes  on  simple  paths  containing  x.  The  color 
attribute  of  x  will  still  be  either  RED  (if  x  is  red-and-black)  or  BLACK  (if  x  is 
doubly  black).  In  other  words,  the  extra  black  on  a  node  is  reflected  in  x’s  pointing 
to  the  node  rather  than  in  the  color  attribute. 

We  can  now  see  the  procedure  RB -Delete-Fixup  and  examine  how  it  restores 
the  red-black  properties  to  the  search  tree. 

RB  -Delete-Fixup  ( T,  x ) 

1  while  x  T.root  and  x. color  ==  BLACK 

2  if  x  ==  x.p.left 

3  w  =  x.p.  right 

4  if  w.  color  ==  RED 


5 

W.  color  =  BLACK 

//  case  1 

6 

x.p.  color  =  RED 

//  case  1 

7 

Left- Rotate  ( T.  x  .p) 

//  case  1 

8 

w  =  x.p.  right 

//  case  1 

9 

if  w .left. color  ==  BLACK  and  w. right. color  ==  BLACK 

10 

w.  color  =  RED 

//  case  2 

11 

x  =  x.p 

//  case  2 

12 

else  if  w.  right,  color  ==  BLACK 

13 

w  .left,  color  =  BLACK 

//  case  3 

14 

w.  color  =  RED 

//  case  3 

15 

Right- Rotate  (7)  w ) 

//  case  3 

16 

w  =  x.p.  right 

//  case  3 

17 

w.  color  =  x.p.  color 

//  case  4 

18 

x.p. color  =  BLACK 

//  case  4 

19 

w. right. color  =  BLACK 

//  case  4 

20 

Left- Rotate  ( T.  x  .p) 

//  case  4 

21 

x  =  T.root 

//  case  4 

22 

else  (same  as  then  clause  with  “right”  and  “left”  exchanged) 

23 

x. color  =  BLACK 

The  procedure  RB -Delete-Fixup  restores  properties  1,  2,  and  4.  Exercises 
13.4-1  and  13.4-2  ask  you  to  show  that  the  procedure  restores  properties  2  and  4, 
and  so  in  the  remainder  of  this  section,  we  shall  focus  on  property  1.  The  goal  of 
the  while  loop  in  lines  1-22  is  to  move  the  extra  black  up  the  tree  until 

1.  x  points  to  a  red-and-black  node,  in  which  case  we  color  x  (singly)  black  in 
line  23; 

2.  x  points  to  the  root,  in  which  case  we  simply  “remove”  the  extra  black;  or 

3.  having  performed  suitable  rotations  and  recolorings,  we  exit  the  loop. 
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Within  the  while  loop,  x  always  points  to  a  nonroot  doubly  black  node.  We 
determine  in  line  2  whether  x  is  a  left  child  or  a  right  child  of  its  parent  x.p.  (We 
have  given  the  code  for  the  situation  in  which  x  is  a  left  child;  the  situation  in 
which  x  is  a  right  child— line  22— is  symmetric.)  We  maintain  a  pointer  w  to 
the  sibling  of  x.  Since  node  x  is  doubly  black,  node  w  cannot  be  T.nil,  because 
otherwise,  the  number  of  blacks  on  the  simple  path  from  x.p  to  the  (singly  black) 
leaf  w  would  be  smaller  than  the  number  on  the  simple  path  from  x.p  to  x. 

The  four  cases2  in  the  code  appear  in  Figure  13.7.  Before  examining  each  case 
in  detail,  let’s  look  more  generally  at  how  we  can  verify  that  the  transformation 
in  each  of  the  cases  preserves  property  5.  The  key  idea  is  that  in  each  case,  the 
transformation  applied  preserves  the  number  of  black  nodes  (including  x’s  extra 
black)  from  (and  including)  the  root  of  the  subtree  shown  to  each  of  the  subtrees 
a,  P, . . . ,  £.  Thus,  if  property  5  holds  prior  to  the  transformation,  it  continues  to 
hold  afterward.  For  example,  in  Figure  13.7(a),  which  illustrates  case  1,  the  num¬ 
ber  of  black  nodes  from  the  root  to  either  subtree  a  or  /I  is  3,  both  before  and  after 
the  transformation.  (Again,  remember  that  node  x  adds  an  extra  black.)  Similarly, 
the  number  of  black  nodes  from  the  root  to  any  of  y,  8,  s,  and  £  is  2,  both  be¬ 
fore  and  after  the  transformation.  In  Figure  13.7(b),  the  counting  must  involve  the 
value  c  of  the  color  attribute  of  the  root  of  the  subtree  shown,  which  can  be  either 
RED  or  BLACK.  If  we  define  count(RED)  =  0  and  count(BLACK)  =  1,  then  the 
number  of  black  nodes  from  the  root  to  a  is  2  +  count(c),  both  before  and  after 
the  transformation.  In  this  case,  after  the  transformation,  the  new  node  x  has  color 
attribute  c,  but  this  node  is  really  either  red-and-black  (if  c  =  RED)  or  doubly  black 
(if  c  =  BLACK).  You  can  verify  the  other  cases  similarly  (see  Exercise  13.4-5). 

Case  1:  x’s  sibling  w  is  red 

Case  1  (lines  5-8  of  RB-Delete-Fixup  and  Figure  13.7(a))  occurs  when  node  w , 
the  sibling  of  node  x,  is  red.  Since  w  must  have  black  children,  we  can  switch  the 
colors  of  w  and  x.p  and  then  perform  a  left-rotation  on  x.p  without  violating  any 
of  the  red-black  properties.  The  new  sibling  of  x,  which  is  one  of  w’s  children 
prior  to  the  rotation,  is  now  black,  and  thus  we  have  converted  case  1  into  case  2, 
3,  or  4. 

Cases  2,  3,  and  4  occur  when  node  w  is  black;  they  are  distinguished  by  the 
colors  of  w ’s  children. 


2As  in  RB  Insert  Fixup,  the  cases  in  RB  Delete  Fixup  are  not  mutually  exclusive. 
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Case  2:  x’s  sibling  w  is  black,  and  both  of  w’s  children  are  black 
In  case  2  (lines  10-11  of  RB -Delete-Fixup  and  Figure  13.7(b)),  both  of  w’s 
children  are  black.  Since  w  is  also  black,  we  take  one  black  off  both  x  and  w, 
leaving  x  with  only  one  black  and  leaving  w  red.  To  compensate  for  removing 
one  black  from  x  and  w,  we  would  like  to  add  an  extra  black  to  x.p,  which  was 
originally  either  red  or  black.  We  do  so  by  repeating  the  while  loop  with  x.p  as 
the  new  node  x.  Observe  that  if  we  enter  case  2  through  case  1,  the  new  node  x 
is  red-and-black,  since  the  original  x.p  was  red.  Hence,  the  value  c  of  the  color 
attribute  of  the  new  node  x  is  RED,  and  the  loop  terminates  when  it  tests  the  loop 
condition.  We  then  color  the  new  node  x  (singly)  black  in  line  23. 

Case  3:  x ’s  sibling  w  is  black,  w’s  left  child  is  red,  and  w’s  right  child  is  black 
Case  3  (lines  13-16  and  Figure  13.7(c))  occurs  when  w  is  black,  its  left  child 
is  red,  and  its  right  child  is  black.  We  can  switch  the  colors  of  w  and  its  left 
child  w .  left  and  then  perform  a  right  rotation  on  w  without  violating  any  of  the 
red-black  properties.  The  new  sibling  w  of  x  is  now  a  black  node  with  a  red  right 
child,  and  thus  we  have  transformed  case  3  into  case  4. 

Case  4:  x ’s  sibling  w  is  black,  and  w ’s  right  child  is  red 

Case  4  (lines  17-21  and  Figure  13.7(d))  occurs  when  node  x’s  sibling  w  is  black 
and  u;’s  right  child  is  red.  By  making  some  color  changes  and  performing  a  left  ro¬ 
tation  on  x.p,  we  can  remove  the  extra  black  on  x,  making  it  singly  black,  without 
violating  any  of  the  red-black  properties.  Setting  x  to  be  the  root  causes  the  while 
loop  to  terminate  when  it  tests  the  loop  condition. 

Analysis 

What  is  the  running  time  of  RB -Delete?  Since  the  height  of  a  red-black  tree  of  n 
nodes  is  D(lgn),  the  total  cost  of  the  procedure  without  the  call  to  RB-Delete- 
Fixup  takes  0{\gn)  time.  Within  RB-Delete-Fixup,  each  of  cases  1,  3,  and  4 
lead  to  termination  after  performing  a  constant  number  of  color  changes  and  at 
most  three  rotations.  Case  2  is  the  only  case  in  which  the  while  loop  can  be  re¬ 
peated,  and  then  the  pointer  x  moves  up  the  tree  at  most  0( lg  n)  times,  performing 
no  rotations.  Thus,  the  procedure  RB-Delete-Fixup  takes  0(lg n)  time  and  per¬ 
forms  at  most  three  rotations,  and  the  overall  time  for  RB -Delete  is  therefore 
also  O(lgn). 
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Figure  13.7  The  cases  in  the  while  loop  of  the  procedure  RB  Delete  Fixup.  Darkened  nodes 
have  color  attributes  BLACK,  heavily  shaded  nodes  have  color  attributes  RED,  and  lightly  shaded 
nodes  have  color  attributes  represented  by  c  and  c' ,  which  may  be  either  RED  or  BLACK.  The  letters 
a,p,...,£  represent  arbitrary  subtrees.  Each  case  transforms  the  configuration  on  the  left  into  the 
configuration  on  the  right  by  changing  some  colors  and/or  performing  a  rotation.  Any  node  pointed 
to  by  x  has  an  extra  black  and  is  either  doubly  black  or  red  and  black.  Only  case  2  causes  the  loop  to 
repeat,  (a)  Case  1  is  transformed  to  case  2,  3,  or  4  by  exchanging  the  colors  of  nodes  B  and  D  and 
performing  a  left  rotation,  (b)  In  case  2,  the  extra  black  represented  by  the  pointer  x  moves  up  the 
tree  by  coloring  node  D  red  and  setting  x  to  point  to  node  B.  If  we  enter  case  2  through  case  1,  the 
w  hile  loop  terminates  because  the  new  node  x  is  red  and  black,  and  therefore  the  value  c  of  its  color 
attribute  is  RED.  (c)  Case  3  is  transformed  to  case  4  by  exchanging  the  colors  of  nodes  C  and  D  and 
performing  a  right  rotation,  (d)  Case  4  removes  the  extra  black  represented  by  x  by  changing  some 
colors  and  performing  a  left  rotation  (without  violating  the  red  black  properties),  and  then  the  loop 
terminates. 
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Exercises 


13.4-1 

Argue  that  after  executing  RB -Delete-Fixup,  the  root  of  the  tree  must  be  black. 


13.4- 2 

Argue  that  if  in  RB -Delete  both  x  and  x.p  are  red,  then  property  4  is  restored  by 
the  call  to  RB  -Delete-Fixup  ( T,  x). 

13.4- 3 

In  Exercise  13.3-2,  you  found  the  red-black  tree  that  results  from  successively 
inserting  the  keys  41, 38,  31,  12,  19,  8  into  an  initially  empty  tree.  Now  show  the 
red-black  trees  that  result  from  the  successive  deletion  of  the  keys  in  the  order 
8,12,19,31,38,41. 


13.4-4 

In  which  lines  of  the  code  for  RB -Delete-Fixup  might  we  examine  or  modify 
the  sentinel  T.nill 


13.4- 5 

In  each  of  the  cases  of  Figure  13.7,  give  the  count  of  black  nodes  from  the  root  of 
the  subtree  shown  to  each  of  the  subtrees  a,  /3, . . . ,  £,  and  verify  that  each  count 
remains  the  same  after  the  transformation.  When  a  node  has  a  color  attribute  c 
or  c',  use  the  notation  count(c)  or  count  (A)  symbolically  in  your  count. 

13.4- 6 

Professors  Skelton  and  Baron  are  concerned  that  at  the  start  of  case  1  of  RB- 
Delete-Fixup,  the  node  x.p  might  not  be  black.  If  the  professors  are  correct, 
then  lines  5-6  are  wrong.  Show  that  x.p  must  be  black  at  the  start  of  case  1,  so  that 
the  professors  have  nothing  to  worry  about. 


13.4-7 

Suppose  that  a  node  x  is  inserted  into  a  red-black  tree  with  RB -INSERT  and  then 
is  immediately  deleted  with  RB -Delete.  Is  the  resulting  red-black  tree  the  same 
as  the  initial  red-black  tree?  Justify  your  answer. 
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13-1  Persistent  dynamic  sets 

During  the  course  of  an  algorithm,  we  sometimes  find  that  we  need  to  maintain  past 
versions  of  a  dynamic  set  as  it  is  updated.  We  call  such  a  set  persistent.  One  way  to 
implement  a  persistent  set  is  to  copy  the  entire  set  whenever  it  is  modified,  but  this 
approach  can  slow  down  a  program  and  also  consume  much  space.  Sometimes,  we 
can  do  much  better. 

Consider  a  persistent  set  S  with  the  operations  Insert,  Delete,  and  Search, 
which  we  implement  using  binary  search  trees  as  shown  in  Figure  13.8(a).  We 
maintain  a  separate  root  for  every  version  of  the  set.  In  order  to  insert  the  key  5 
into  the  set,  we  create  a  new  node  with  key  5.  This  node  becomes  the  left  child 
of  a  new  node  with  key  7,  since  we  cannot  modify  the  existing  node  with  key  7. 
Similarly,  the  new  node  with  key  7  becomes  the  left  child  of  a  new  node  with 
key  8  whose  right  child  is  the  existing  node  with  key  10.  The  new  node  with  key  8 
becomes,  in  turn,  the  right  child  of  a  new  root  r'  with  key  4  whose  left  child  is  the 
existing  node  with  key  3.  We  thus  copy  only  part  of  the  tree  and  share  some  of  the 
nodes  with  the  original  tree,  as  shown  in  Figure  13.8(b). 

Assume  that  each  tree  node  has  the  attributes  key,  left,  and  right  but  no  parent. 
(See  also  Exercise  13.3-6.) 


Figure  13.8  (a)  A  binary  search  tree  with  keys  2,3, 4,7,8, 10.  (b)  The  persistent  binary  search 
tree  that  results  from  the  insertion  of  key  5.  The  most  recent  version  of  the  set  consists  of  the  nodes 
reachable  from  the  root  /,  and  the  previous  version  consists  of  the  nodes  reachable  from  r.  Heavily 
shaded  nodes  are  added  when  key  5  is  inserted. 
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a.  For  a  general  persistent  binary  search  tree,  identify  the  nodes  that  we  need  to 
change  to  insert  a  key  k  or  delete  a  node  y. 

b.  Write  a  procedure  Persistent-Tree-Insert  that,  given  a  persistent  tree  T 
and  a  key  k  to  insert,  returns  a  new  persistent  tree  T'  that  is  the  result  of  insert¬ 
ing  k  into  T . 

c.  If  the  height  of  the  persistent  binary  search  tree  T  is  h,  what  are  the  time  and 
space  requirements  of  your  implementation  of  Persistent-Tree-Insert? 
(The  space  requirement  is  proportional  to  the  number  of  new  nodes  allocated.) 

d.  Suppose  that  we  had  included  the  parent  attribute  in  each  node.  In  this  case, 
Persistent-Tree-Insert  would  need  to  perform  additional  copying.  Prove 
that  Persistent-Tree-Insert  would  then  require  £2(«)  time  and  space, 
where  n  is  the  number  of  nodes  in  the  tree. 

e.  Show  how  to  use  red-black  trees  to  guarantee  that  the  worst-case  running  time 
and  space  are  0(lg  n)  per  insertion  or  deletion. 

13-2  Join  operation  on  red-black  trees 

The  join  operation  takes  two  dynamic  sets  5)  and  S2  and  an  element  x  such  that 
for  any  at  e  >Sj  and  x2  e  S2,  we  have  at  .key  <  x.key  <  x2.key.  It  returns  a  set 
S  =  Si  U  {a}  U  S2.  In  this  problem,  we  investigate  how  to  implement  the  join 
operation  on  red-black  trees. 

a.  Given  a  red-black  tree  T,  let  us  store  its  black-height  as  the  new  attribute  T.bh. 
Argue  that  RB-Insert  and  RB-Delete  can  maintain  the  bh  attribute  with¬ 
out  requiring  extra  storage  in  the  nodes  of  the  tree  and  without  increasing  the 
asymptotic  running  times.  Show  that  while  descending  through  T,  we  can  de¬ 
termine  the  black-height  of  each  node  we  visit  in  0(1)  time  per  node  visited. 

We  wish  to  implement  the  operation  RB-JoiN(7j,  x,  T2),  which  destroys  7j  and  T2 
and  returns  a  red-black  tree  T  =  7j  IJ  (x)TJ  T2.  Let  n  be  the  total  number  of  nodes 
in  7j  and  T2. 

b.  Assume  that  T t . bh  >  T2.bh.  Describe  an  0(lg «)-time  algorithm  that  finds  a 
black  node  y  in  7j  with  the  largest  key  from  among  those  nodes  whose  black- 
height  is  T2.bh. 

c.  Let  Ty  be  the  subtree  rooted  at  y.  Describe  how  Ty  U  {xj  IJ  T2  can  replace  Ty 
in  0(1)  time  without  destroying  the  binary-search-tree  property. 

d.  What  color  should  we  make  x  so  that  red-black  properties  1,3,  and  5  are  main¬ 
tained?  Describe  how  to  enforce  properties  2  and  4  in  0(lg  n)  time. 
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e.  Argue  that  no  generality  is  lost  by  making  the  assumption  in  part  (b).  Describe 
the  symmetric  situation  that  arises  when  T\.bh  <  T2.bh. 

f.  Argue  that  the  running  time  of  RB-JOIN  is  0(lgn). 

13-3  AVL  trees 

An  AVL  tree  is  a  binary  search  tree  that  is  height  balanced :  for  each  node  x,  the 
heights  of  the  left  and  right  subtrees  of  x  differ  by  at  most  1 .  To  implement  an  AVL 
tree,  we  maintain  an  extra  attribute  in  each  node:  x.h  is  the  height  of  node  x.  As 
for  any  other  binary  search  tree  T,  we  assume  that  T.root  points  to  the  root  node. 

a.  Prove  that  an  AVL  tree  with  n  nodes  has  height  0{lgn).  {Hint:  Prove  that 
an  AVL  tree  of  height  h  has  at  least  Fh  nodes,  where  Fh  is  the  hi h  Fibonacci 
number.) 

b.  To  insert  into  an  AVL  tree,  we  first  place  a  node  into  the  appropriate  place  in  bi¬ 
nary  search  tree  order.  Afterward,  the  tree  might  no  longer  be  height  balanced. 
Specifically,  the  heights  of  the  left  and  right  children  of  some  node  might  differ 
by  2.  Describe  a  procedure  Balance(x),  which  takes  a  subtree  rooted  at  x 
whose  left  and  right  children  are  height  balanced  and  have  heights  that  differ 
by  at  most  2,  i.e.,  | x. right. h  —  x.left.h\  <  2,  and  alters  the  subtree  rooted  at  x 
to  be  height  balanced.  {Hint:  Use  rotations.) 

c.  Using  part  (b),  describe  a  recursive  procedure  AVL-Insert(x,  z)  that  takes 
a  node  x  within  an  AVL  tree  and  a  newly  created  node  z  (whose  key  has  al¬ 
ready  been  filled  in),  and  adds  z  to  the  subtree  rooted  at  x,  maintaining  the 
property  that  x  is  the  root  of  an  AVL  tree.  As  in  Tree-Insert  from  Sec¬ 
tion  12.3,  assume  that  z.key  has  already  been  filled  in  and  that  z.left  =  NIL 
and  z. right  =  NIL;  also  assume  that  z.h  =  0.  Thus,  to  insert  the  node  z  into 
the  AVL  tree  T,  we  call  AVL-lNSERT(T.root,  z). 

d.  Show  that  AVL-Insert,  run  on  an  n-node  AVL  tree,  takes  0{\gn)  time  and 
performs  (9(1)  rotations. 

13-4  Treaps 

If  we  insert  a  set  of  n  items  into  a  binary  search  tree,  the  resulting  tree  may  be 
horribly  unbalanced,  leading  to  long  search  times.  As  we  saw  in  Section  12.4, 
however,  randomly  built  binary  search  trees  tend  to  be  balanced.  Therefore,  one 
strategy  that,  on  average,  builds  a  balanced  tree  for  a  fixed  set  of  items  would  be  to 
randomly  permute  the  items  and  then  insert  them  in  that  order  into  the  tree. 

What  if  we  do  not  have  all  the  items  at  once?  If  we  receive  the  items  one  at  a 
time,  can  we  still  randomly  build  a  binary  search  tree  out  of  them? 
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Figure  13.9  A  treap.  Each  node  x  is  labeled  with  x.key :  x. priority.  For  example,  the  root  has 
key  G  and  priority  4. 

We  will  examine  a  data  structure  that  answers  this  question  in  the  affirmative.  A 
treap  is  a  binary  search  tree  with  a  modified  way  of  ordering  the  nodes.  Figure  13.9 
shows  an  example.  As  usual,  each  node  x  in  the  tree  has  a  key  value  x.key.  In 
addition,  we  assign  x. priority,  which  is  a  random  number  chosen  independently 
for  each  node.  We  assume  that  all  priorities  are  distinct  and  also  that  all  keys  are 
distinct.  The  nodes  of  the  treap  are  ordered  so  that  the  keys  obey  the  binary-search- 
tree  property  and  the  priorities  obey  the  min-heap  order  property: 

•  If  v  is  a  left  child  of  u,  then  v.key  <  u.key. 

•  If  v  is  a  right  child  of  u,  then  v.key  >  u.key. 

•  If  v  is  a  child  of  w,  then  v. priority  >  u. priority. 

(This  combination  of  properties  is  why  the  tree  is  called  a  “treap”:  it  has  features 
of  both  a  binary  search  tree  and  a  heap.) 

It  helps  to  think  of  treaps  in  the  following  way.  Suppose  that  we  insert  nodes 
X\ , x2, . . .  ,x„,  with  associated  keys,  into  a  treap.  Then  the  resulting  treap  is  the 
tree  that  would  have  been  formed  if  the  nodes  had  been  inserted  into  a  normal 
binary  search  tree  in  the  order  given  by  their  (randomly  chosen)  priorities,  i.e., 
Xi. priority  <  Xj. priority  means  that  we  had  inserted  x,  before  Xj. 

a.  Show  that  given  a  set  of  nodes  x\,  Xi, . . . ,  x„,  with  associated  keys  and  priori¬ 
ties,  all  distinct,  the  treap  associated  with  these  nodes  is  unique. 

b.  Show  that  the  expected  height  of  a  treap  is  0(lg  n ),  and  hence  the  expected  time 
to  search  for  a  value  in  the  treap  is  0(lg  n). 

Let  us  see  how  to  insert  a  new  node  into  an  existing  treap.  The  first  thing  we  do 
is  assign  to  the  new  node  a  random  priority.  Then  we  call  the  insertion  algorithm, 
which  we  call  TREAP-lNSERT,  whose  operation  is  illustrated  in  Figure  13.10. 
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Figure  13.10  The  operation  of  Treap  Insert,  (a)  The  original  treap,  prior  to  insertion,  (b)  The 
treap  after  inserting  a  node  with  key  C  and  priority  25.  (c)  (d)  Intermediate  stages  when  inserting  a 
node  with  key  D  and  priority  9.  (e)  The  treap  after  the  insertion  of  parts  (c)  and  (d)  is  done,  (f)  The 
treap  after  inserting  a  node  with  key  F  and  priority  2. 
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Figure  0.11  Spines  of  a  binary  search  tree.  The  left  spine  is  shaded  in  (a),  and  the  right  spine  is 
shaded  in  (b). 

c.  Explain  how  TREAP-lNSERT  works.  Explain  the  idea  in  English  and  give  pseu¬ 
docode.  (Hint:  Execute  the  usual  binary-search-tree  insertion  procedure  and 
then  perform  rotations  to  restore  the  min-heap  order  property.) 

d.  Show  that  the  expected  running  time  of  Treap-Insert  is  0(lg  n). 

Treap-Insert  performs  a  search  and  then  a  sequence  of  rotations.  Although 
these  two  operations  have  the  same  expected  running  time,  they  have  different 
costs  in  practice.  A  search  reads  information  from  the  treap  without  modifying  it. 
In  contrast,  a  rotation  changes  parent  and  child  pointers  within  the  treap.  On  most 
computers,  read  operations  are  much  faster  than  write  operations.  Thus  we  would 
like  TREAP-lNSERT  to  perform  few  rotations.  We  will  show  that  the  expected 
number  of  rotations  performed  is  bounded  by  a  constant. 

In  order  to  do  so,  we  will  need  some  definitions,  which  Figure  13.11  depicts. 
The  left  spine  of  a  binary  search  tree  T  is  the  simple  path  from  the  root  to  the  node 
with  the  smallest  key.  In  other  words,  the  left  spine  is  the  simple  path  from  the 
root  that  consists  of  only  left  edges.  Symmetrically,  the  right  spine  of  T  is  the 
simple  path  from  the  root  consisting  of  only  right  edges.  The  length  of  a  spine  is 
the  number  of  nodes  it  contains. 

e.  Consider  the  treap  T  immediately  after  Treap-Insert  has  inserted  node  x. 
Let  C  be  the  length  of  the  right  spine  of  the  left  subtree  of  x.  Let  D  be  the 
length  of  the  left  spine  of  the  right  subtree  of  x.  Prove  that  the  total  number  of 
rotations  that  were  performed  during  the  insertion  of  x  is  equal  to  C  +  D. 

We  will  now  calculate  the  expected  values  of  C  and  D.  Without  loss  of  generality, 
we  assume  that  the  keys  are  1,2 ,...,«,  since  we  are  comparing  them  only  to  one 
another. 
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For  nodes  x  and  y  in  treap  T,  where  y  ^  x,  let  k  =  x . key  and  i  =  y  .key.  We 
define  indicator  random  variables 

Xjk  =  I  {y  is  in  the  right  spine  of  the  left  subtree  of  x}  . 

f.  Show  that  =  1  if  and  only  if  y  .priority  >  x. priority,  y.key  <  x.key,  and, 
for  every  z  such  that  y.key  <  z.key  <  x.key,  we  have  y .priority  <  z. priority. 


g.  Show  that 

Pr  {X^  =  1}  = 


(k-i  -  1)! 

(k  —  i  +  1)! 

1 

(k  —  i  +  \)(k  —  i) 


h.  Show  that 


k— 1 


Elcl  =  E-77 


y  =  1  j(j  + 


1 

=  X~k- 

i.  Use  a  symmetry  argument  to  show  that 
1 


E[D]  =  \- 


n  —  k  +  1 


j.  Conclude  that  the  expected  number  of  rotations  performed  when  inserting  a 
node  into  a  treap  is  less  than  2. 


Chapter  notes 

The  idea  of  balancing  a  search  tree  is  due  to  AdeFson-VeTskii  and  Landis  [2],  who 
introduced  a  class  of  balanced  search  trees  called  “AVL  trees”  in  1962,  described  in 
Problem  13-3.  Another  class  of  search  trees,  called  “2-3  trees,”  was  introduced  by 
J.  E.  Hopcroft  (unpublished)  in  1970.  A  2-3  tree  maintains  balance  by  manipulating 
the  degrees  of  nodes  in  the  tree.  Chapter  18  covers  a  generalization  of  2-3  trees 
introduced  by  Bayer  and  McCreight  [35],  called  “B-trees.” 

Red-black  trees  were  invented  by  Bayer  [34]  under  the  name  “symmetric  binary 
B-trees.”  Guibas  and  Sedgewick  [155]  studied  their  properties  at  length  and  in¬ 
troduced  the  red/black  color  convention.  Andersson  [15]  gives  a  simpler-to-code 
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variant  of  red-black  trees.  Weiss  [351]  calls  this  variant  AA-trees.  An  AA-tree  is 
similar  to  a  red-black  tree  except  that  left  children  may  never  be  red. 

Treaps,  the  subject  of  Problem  13-4,  were  proposed  by  Seidel  and  Aragon  [309]. 
They  are  the  default  implementation  of  a  dictionary  in  LEDA  [253],  which  is  a 
well-implemented  collection  of  data  structures  and  algorithms. 

There  are  many  other  variations  on  balanced  binary  trees,  including  weight- 
balanced  trees  [264],  k  -neighbor  trees  [245],  and  scapegoat  trees  [127].  Perhaps 
the  most  intriguing  are  the  “splay  trees”  introduced  by  Sleator  and  Tarjan  [320], 
which  are  “self-adjusting.”  (See  Tarjan  [330]  for  a  good  description  of  splay  trees.) 
Splay  trees  maintain  balance  without  any  explicit  balance  condition  such  as  color. 
Instead,  “splay  operations”  (which  involve  rotations)  are  performed  within  the  tree 
eveiy  time  an  access  is  made.  The  amortized  cost  (see  Chapter  17)  of  each  opera¬ 
tion  on  an  n-node  tree  is  0(lg  n). 

Skip  lists  [286]  provide  an  alternative  to  balanced  binary  trees.  A  skip  list  is  a 
linked  list  that  is  augmented  with  a  number  of  additional  pointers.  Each  dictionary 
operation  runs  in  expected  time  0(\g  n)  on  a  skip  list  of  n  items. 
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Some  engineering  situations  require  no  more  than  a  “textbook”  data  struc¬ 
ture-such  as  a  doubly  linked  list,  a  hash  table,  or  a  binary  search  tree— but  many 
others  require  a  dash  of  creativity.  Only  in  rare  situations  will  you  need  to  cre¬ 
ate  an  entirely  new  type  of  data  structure,  though.  More  often,  it  will  suffice  to 
augment  a  textbook  data  structure  by  storing  additional  information  in  it.  You  can 
then  program  new  operations  for  the  data  structure  to  support  the  desired  applica¬ 
tion.  Augmenting  a  data  structure  is  not  always  straightforward,  however,  since  the 
added  information  must  be  updated  and  maintained  by  the  ordinary  operations  on 
the  data  structure. 

This  chapter  discusses  two  data  structures  that  we  construct  by  augmenting  red- 
black  trees.  Section  14.1  describes  a  data  structure  that  supports  general  order- 
statistic  operations  on  a  dynamic  set.  We  can  then  quickly  find  the  z'th  smallest 
number  in  a  set  or  the  rank  of  a  given  element  in  the  total  ordering  of  the  set. 
Section  14.2  abstracts  the  process  of  augmenting  a  data  structure  and  provides  a 
theorem  that  can  simplify  the  process  of  augmenting  red-black  trees.  Section  14.3 
uses  this  theorem  to  help  design  a  data  structure  for  maintaining  a  dynamic  set  of 
intervals,  such  as  time  intervals.  Given  a  query  interval,  we  can  then  quickly  find 
an  interval  in  the  set  that  overlaps  it. 


14.1  Dynamic  order  statistics 

Chapter  9  introduced  the  notion  of  an  order  statistic.  Specifically,  the  zth  order 
statistic  of  a  set  of  n  elements,  where  i  e  {1,2,..., «},  is  simply  the  element  in  the 
set  with  the  zth  smallest  key.  We  saw  how  to  determine  any  order  statistic  in  O(n) 
time  from  an  unordered  set.  In  this  section,  we  shall  see  how  to  modify  red-black 
trees  so  that  we  can  determine  any  order  statistic  for  a  dynamic  set  in  0(lg  n)  time. 
We  shall  also  see  how  to  compute  the  rank  of  an  element— its  position  in  the  linear 
order  of  the  set— in  0{\gn)  time. 
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Figure  14.1  An  order  statistic  tree,  which  is  an  augmented  red  black  tree.  Shaded  nodes  are  red, 
and  darkened  nodes  are  black.  In  addition  to  its  usual  attributes,  each  node  x  has  an  attribute  x.size, 
which  is  the  number  of  nodes,  other  than  the  sentinel,  in  the  subtree  rooted  at  x. 

Figure  14. 1  shows  a  data  structure  that  can  support  fast  order-statistic  operations. 
An  order-statistic  tree  T  is  simply  a  red-black  tree  with  additional  information 
stored  in  each  node.  Besides  the  usual  red-black  tree  attributes  x.key ,  x. color,  x.p, 
x.left,  and  x. right  in  a  node  x,  we  have  another  attribute,  x.size.  This  attribute 
contains  the  number  of  (internal)  nodes  in  the  subtree  rooted  at  x  (including  x 
itself),  that  is,  the  size  of  the  subtree.  If  we  define  the  sentinel’s  size  to  be  0— that 
is,  we  set  T.  nil.  size  to  be  0— then  we  have  the  identity 

x.size  =  x.left. size  -F  x. right. size  +  1  . 

We  do  not  require  keys  to  be  distinct  in  an  order-statistic  tree.  (For  example,  the 
tree  in  Figure  14.1  has  two  keys  with  value  14  and  two  keys  with  value  21.)  In  the 
presence  of  equal  keys,  the  above  notion  of  rank  is  not  well  defined.  We  remove 
this  ambiguity  for  an  order-statistic  tree  by  defining  the  rank  of  an  element  as  the 
position  at  which  it  would  be  printed  in  an  inorder  walk  of  the  tree.  In  Figure  14. 1 , 
for  example,  the  key  14  stored  in  a  black  node  has  rank  5,  and  the  key  14  stored  in 
a  red  node  has  rank  6. 

Retrieving  an  element  with  a  given  rank 

Before  we  show  how  to  maintain  this  size  information  during  insertion  and  dele¬ 
tion,  let  us  examine  the  implementation  of  two  order- statistic  queries  that  use  this 
additional  information.  We  begin  with  an  operation  that  retrieves  an  element  with 
a  given  rank.  The  procedure  OS -Select  (x,  i )  returns  a  pointer  to  the  node  con¬ 
taining  the  ith  smallest  key  in  the  subtree  rooted  at  x.  To  find  the  node  with  the  tth 
smallest  key  in  an  order-statistic  tree  T ,  we  call  O  S  -  Select  ( T.  root ,  i). 
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OS-Select(x,  /') 

1  r  =  x.  left,  size  +  1 

2  if  i  ==  r 

3  return  x 

4  elseif  i  <  r 

5  return  OS-Select(x./^,  i) 

6  else  return  OS-Select(x. n'g/rf,  i  —  r) 

In  line  1  of  OS -Select,  we  compute  r,  the  rank  of  node  x  within  the  subtree 
rooted  at  x.  The  value  of  x. left. size  is  the  number  of  nodes  that  come  before  x 
in  an  inorder  tree  walk  of  the  subtree  rooted  at  x.  Thus,  x. left. size  +  1  is  the 
rank  of  x  within  the  subtree  rooted  at  x.  If  i  =  r,  then  node  x  is  the  / th  smallest 
element,  and  so  we  return  x  in  line  3.  If  i  <  r,  then  the  /th  smallest  element 
resides  in  x’s  left  subtree,  and  so  we  recurse  on  x.left  in  line  5.  If  i  >  r,  then 
the  /  th  smallest  element  resides  in  x’s  right  subtree.  Since  the  subtree  rooted  at  x 
contains  r  elements  that  come  before  x ’s  right  subtree  in  an  inorder  tree  walk,  the 
/'th  smallest  element  in  the  subtree  rooted  at  x  is  the  (/'  —  r)th  smallest  element  in 
the  subtree  rooted  at  x.  right.  Line  6  determines  this  element  recursively. 

To  see  how  OS-Select  operates,  consider  a  search  for  the  17th  smallest  ele¬ 
ment  in  the  order-statistic  tree  of  Figure  14.1.  We  begin  with  x  as  the  root,  whose 
key  is  26,  and  with  i  =  17.  Since  the  size  of  26’s  left  subtree  is  12,  its  rank  is  13. 
Thus,  we  know  that  the  node  with  rank  17  is  the  17  —  13  =  4th  smallest  element 
in  26’s  right  subtree.  After  the  recursive  call,  x  is  the  node  with  key  41,  and  i  =  4. 
Since  the  size  of  41 ’s  left  subtree  is  5,  its  rank  within  its  subtree  is  6.  Thus,  we 
know  that  the  node  with  rank  4  is  the  4th  smallest  element  in  41 ’s  left  subtree.  Af¬ 
ter  the  recursive  call,  x  is  the  node  with  key  30,  and  its  rank  within  its  subtree  is  2. 
Thus,  we  recurse  once  again  to  find  the  4  —  2  =  2nd  smallest  element  in  the  subtree 
rooted  at  the  node  with  key  38.  We  now  find  that  its  left  subtree  has  size  1,  which 
means  it  is  the  second  smallest  element.  Thus,  the  procedure  returns  a  pointer  to 
the  node  with  key  38. 

Because  each  recursive  call  goes  down  one  level  in  the  order-statistic  tree,  the 
total  time  for  OS-Select  is  at  worst  proportional  to  the  height  of  the  tree.  Since 
the  tree  is  a  red-black  tree,  its  height  is  0{\gn),  where  n  is  the  number  of  nodes. 
Thus,  the  running  time  of  OS-Select  is  0(\g  n)  for  a  dynamic  set  of  n  elements. 

Determining  the  rank  of  an  element 

Given  a  pointer  to  a  node  x  in  an  order-statistic  tree  T,  the  procedure  OS-Rank 
returns  the  position  of  x  in  the  linear  order  determined  by  an  inorder  tree  walk 
of  T. 
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OS-RANK(T,x) 

1  r  =  x  .left. size  +  1 

2  y  =  x 

3  while  y  ^  T.  root 

4  if  y==y.p. right 

5  r  =  r  +  y.p. left. size  +  1 

6  y  =  y-p 

7  return  r 

The  procedure  works  as  follows.  We  can  think  of  node  x’s  rank  as  the  number  of 
nodes  preceding  x  in  an  inorder  tree  walk,  plus  1  for  x  itself.  OS -Rank  maintains 
the  following  loop  invariant: 

At  the  start  of  each  iteration  of  the  while  loop  of  lines  3-6,  r  is  the  rank 
of  x.kery  in  the  subtree  rooted  at  node  y. 

We  use  this  loop  invariant  to  show  that  OS -Rank  works  correctly  as  follows: 

Initialization:  Prior  to  the  first  iteration,  line  1  sets  r  to  be  the  rank  of  x.  key  within 
the  subtree  rooted  at  x.  Setting  y  =  x  in  line  2  makes  the  invariant  true  the 
first  time  the  test  in  line  3  executes. 

Maintenance:  At  the  end  of  each  iteration  of  the  while  loop,  we  set  y  =  y.p. 
Thus  we  must  show  that  if  r  is  the  rank  of  x .  key  in  the  subtree  rooted  at  y  at  the 
start  of  the  loop  body,  then  r  is  the  rank  of  x .  key  in  the  subtree  rooted  at  y  .p 
at  the  end  of  the  loop  body.  In  each  iteration  of  the  while  loop,  we  consider 
the  subtree  rooted  at  y.p.  We  have  already  counted  the  number  of  nodes  in  the 
subtree  rooted  at  node  y  that  precede  x  in  an  inorder  walk,  and  so  we  must  add 
the  nodes  in  the  subtree  rooted  at  y ’s  sibling  that  precede  x  in  an  inorder  walk, 
plus  1  for  y.p  if  it,  too,  precedes  x.  If  y  is  a  left  child,  then  neither  y.p  nor  any 
node  in  y.p’s  right  subtree  precedes  x,  and  so  we  leave  r  alone.  Otherwise,  y  is 
a  right  child  and  all  the  nodes  in  y.p’ s  left  subtree  precede  x,  as  does  y.p  itself. 
Thus,  in  line  5,  we  add  y.p. left. size  +  1  to  the  current  value  of  r. 

Termination:  The  loop  terminates  when  y  =  T.root,  so  that  the  subtree  rooted 
at  y  is  the  entire  tree.  Thus,  the  value  of  r  is  the  rank  of  x .  key  in  the  entire  tree. 

As  an  example,  when  we  run  OS-Rank  on  the  order-statistic  tree  of  Figure  14.1 
to  find  the  rank  of  the  node  with  key  38,  we  get  the  following  sequence  of  values 
of  y  .key  and  r  at  the  top  of  the  while  loop: 

iteration  y  .key  r 

1  38  2 

2  30  4 

3  41  4 

4  26  17 
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The  procedure  returns  the  rank  17. 

Since  each  iteration  of  the  while  loop  takes  0(1)  time,  and  y  goes  up  one  level  in 
the  tree  with  each  iteration,  the  running  time  of  OS -Rank  is  at  worst  proportional 
to  the  height  of  the  tree:  0(lg  n)  on  an  n-node  order-statistic  tree. 

Maintaining  subtree  sizes 

Given  the  size  attribute  in  each  node,  OS-SELECT  and  OS-Rank  can  quickly 
compute  order-statistic  information.  But  unless  we  can  efficiently  maintain  these 
attributes  within  the  basic  modifying  operations  on  red-black  trees,  our  work  will 
have  been  for  naught.  We  shall  now  show  how  to  maintain  subtree  sizes  for  both 
insertion  and  deletion  without  affecting  the  asymptotic  running  time  of  either  op¬ 
eration. 

We  noted  in  Section  13.3  that  insertion  into  a  red-black  tree  consists  of  two 
phases.  The  first  phase  goes  down  the  tree  from  the  root,  inserting  the  new  node 
as  a  child  of  an  existing  node.  The  second  phase  goes  up  the  tree,  changing  colors 
and  performing  rotations  to  maintain  the  red-black  properties. 

To  maintain  the  subtree  sizes  in  the  first  phase,  we  simply  increment  x.size  for 
each  node  x  on  the  simple  path  traversed  from  the  root  down  toward  the  leaves.  The 
new  node  added  gets  a  size  of  1.  Since  there  are  0(\gn)  nodes  on  the  traversed 
path,  the  additional  cost  of  maintaining  the  size  attributes  is  0(lg  n). 

In  the  second  phase,  the  only  structural  changes  to  the  underlying  red-black  tree 
are  caused  by  rotations,  of  which  there  are  at  most  two.  Moreover,  a  rotation  is 
a  local  operation:  only  two  nodes  have  their  size  attributes  invalidated.  The  link 
around  which  the  rotation  is  performed  is  incident  on  these  two  nodes.  Referring 
to  the  code  for  Left-Rotate  (T,  x)  in  Section  13.2,  we  add  the  following  lines: 

13  y.size  =  x.size 

14  x.size  =  x. left. size  +  x. right. size  +  1 

Figure  14.2  illustrates  how  the  attributes  are  updated.  The  change  to  Right- 
Rotate  is  symmetric. 

Since  at  most  two  rotations  are  performed  during  insertion  into  a  red-black  tree, 
we  spend  only  0(1)  additional  time  updating  size  attributes  in  the  second  phase. 
Thus,  the  total  time  for  insertion  into  an  //-node  order-statistic  tree  is  0(lg  n), 
which  is  asymptotically  the  same  as  for  an  ordinary  red-black  tree. 

Deletion  from  a  red-black  tree  also  consists  of  two  phases:  the  first  operates 
on  the  underlying  search  tree,  and  the  second  causes  at  most  three  rotations  and 
otherwise  performs  no  structural  changes.  (See  Section  13.4.)  The  first  phase 
either  removes  one  node  y  from  the  tree  or  moves  upward  it  within  the  tree.  To 
update  the  subtree  sizes,  we  simply  traverse  a  simple  path  from  node  y  (stalling 
from  its  original  position  within  the  tree)  up  to  the  root,  decrementing  the  size 
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Figure  14.2  Updating  subtree  sizes  during  rotations.  The  link  around  which  we  rotate  is  incident 
on  the  two  nodes  whose  size  attributes  need  to  be  updated.  The  updates  are  local,  requiring  only  the 
size  information  stored  in  x,  y,  and  the  roots  of  the  subtrees  shown  as  triangles. 

attribute  of  each  node  on  the  path.  Since  this  path  has  length  0(lgn)  in  an  n- 
node  red-black  tree,  the  additional  time  spent  maintaining  size  attributes  in  the  first 
phase  is  O(lgtt).  We  handle  the  0(1)  rotations  in  the  second  phase  of  deletion 
in  the  same  manner  as  for  insertion.  Thus,  both  insertion  and  deletion,  including 
maintaining  the  size  attributes,  take  0(lg  n)  time  for  an  n-node  order-statistic  tree. 

Exercises 


14.1-1 

Show  how  O S - S ELECT (T.  root,  10)  operates  on  the  red-black  tree  T  of  Fig¬ 
ure  14.1. 


14.1-2 

Show  how  OS-RANK(r,  x)  operates  on  the  red-black  tree  T  of  Figure  14.1  and 
the  node  x  with  x.key  =  35. 


14.1- 3 

Write  a  nonrecursive  version  of  OS-SELECT. 

14.1- 4 

Write  a  recursive  procedure  OS-Key-Rank (7\ k)  that  takes  as  input  an  order- 
statistic  tree  T  and  a  key  k  and  returns  the  rank  of  k  in  the  dynamic  set  represented 
by  T.  Assume  that  the  keys  of  T  are  distinct. 


14.1-5 

Given  an  element  x  in  an  /t-node  order-statistic  tree  and  a  natural  number  i,  how 
can  we  determine  the  / th  successor  of  x  in  the  linear  order  of  the  tree  in  0(lg  n) 
time? 


14.2  How  to  augment  a  data  structure 
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14.1-6 

Observe  that  whenever  we  reference  the  size  attribute  of  a  node  in  either  OS- 
Select  or  OS-Rank,  we  use  it  only  to  compute  a  rank.  Accordingly,  suppose 
we  store  in  each  node  its  rank  in  the  subtree  of  which  it  is  the  root.  Show  how  to 
maintain  this  information  during  insertion  and  deletion.  (Remember  that  these  two 
operations  can  cause  rotations.) 


14.1- 7 

Show  how  to  use  an  order-statistic  tree  to  count  the  number  of  inversions  (see 
Problem  2-4)  in  an  array  of  size  n  in  time  0(n  lg  n). 

14.1- 8  * 

Consider  n  chords  on  a  circle,  each  defined  by  its  endpoints.  Describe  an  0(n  lg  n)- 
time  algorithm  to  determine  the  number  of  pairs  of  chords  that  intersect  inside  the 
circle.  (For  example,  if  the  n  chords  are  all  diameters  that  meet  at  the  center,  then 
the  correct  answer  is  ("2).)  Assume  that  no  two  chords  share  an  endpoint. 


14.2  How  to  augment  a  data  structure 

The  process  of  augmenting  a  basic  data  structure  to  support  additional  functionality 
occurs  quite  frequently  in  algorithm  design.  We  shall  use  it  again  in  the  next  section 
to  design  a  data  structure  that  supports  operations  on  intervals.  In  this  section,  we 
examine  the  steps  involved  in  such  augmentation.  We  shall  also  prove  a  theorem 
that  allows  us  to  augment  red-black  trees  easily  in  many  cases. 

We  can  break  the  process  of  augmenting  a  data  structure  into  four  steps: 

1.  Choose  an  underlying  data  structure. 

2.  Determine  additional  information  to  maintain  in  the  underlying  data  structure. 

3.  Verify  that  we  can  maintain  the  additional  information  for  the  basic  modifying 
operations  on  the  underlying  data  structure. 

4.  Develop  new  operations. 

As  with  any  prescriptive  design  method,  you  should  not  blindly  follow  the  steps 
in  the  order  given.  Most  design  work  contains  an  element  of  trial  and  error,  and 
progress  on  all  steps  usually  proceeds  in  parallel.  There  is  no  point,  for  example,  in 
determining  additional  information  and  developing  new  operations  (steps  2  and  4) 
if  we  will  not  be  able  to  maintain  the  additional  information  efficiently.  Neverthe¬ 
less,  this  four-step  method  provides  a  good  focus  for  your  efforts  in  augmenting 
a  data  structure,  and  it  is  also  a  good  way  to  organize  the  documentation  of  an 
augmented  data  structure. 
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We  followed  these  steps  in  Section  14.1  to  design  our  order-statistic  trees.  For 
step  1,  we  chose  red-black  trees  as  the  underlying  data  structure.  A  clue  to  the 
suitability  of  red-black  trees  comes  from  their  efficient  support  of  other  dynamic- 
set  operations  on  a  total  order,  such  as  Minimum,  Maximum,  Successor,  and 
Predecessor. 

For  step  2,  we  added  the  size  attribute,  in  which  each  node  x  stores  the  size  of  the 
subtree  rooted  at  x.  Generally,  the  additional  information  makes  operations  more 
efficient.  For  example,  we  could  have  implemented  OS-Select  and  OS-Rank 
using  just  the  keys  stored  in  the  tree,  but  they  would  not  have  run  in  0( lg  n)  time. 
Sometimes,  the  additional  information  is  pointer  information  rather  than  data,  as 
in  Exercise  14.2-1. 

For  step  3,  we  ensured  that  insertion  and  deletion  could  maintain  the  size  at¬ 
tributes  while  still  running  in  0(lgn)  time.  Ideally,  we  should  need  to  update  only 
a  few  elements  of  the  data  structure  in  order  to  maintain  the  additional  information. 
For  example,  if  we  simply  stored  in  each  node  its  rank  in  the  tree,  the  OS -Select 
and  OS -Rank  procedures  would  run  quickly,  but  inserting  a  new  minimum  ele¬ 
ment  would  cause  a  change  to  this  information  in  every  node  of  the  tree.  When  we 
store  subtree  sizes  instead,  inserting  a  new  element  causes  information  to  change 
in  only  0(lg  n)  nodes. 

For  step  4,  we  developed  the  operations  OS-Select  and  OS-Rank.  After  all, 
the  need  for  new  operations  is  why  we  bother  to  augment  a  data  structure  in  the  first 
place.  Occasionally,  rather  than  developing  new  operations,  we  use  the  additional 
information  to  expedite  existing  ones,  as  in  Exercise  14.2-1. 

Augmenting  red-black  trees 

When  red-black  trees  underlie  an  augmented  data  structure,  we  can  prove  that  in¬ 
sertion  and  deletion  can  always  efficiently  maintain  certain  kinds  of  additional  in¬ 
formation,  thereby  making  step  3  very  easy.  The  proof  of  the  following  theorem  is 
similar  to  the  argument  from  Section  14.1  that  we  can  maintain  the  size  attribute 
for  order-statistic  trees. 

Theorem  14.1  (Augmenting  a  red-black  tree ) 

Let  /  be  an  attribute  that  augments  a  red-black  tree  T  of  n  nodes,  and  suppose  that 
the  value  of  /  for  each  node  x  depends  on  only  the  information  in  nodes  x,  x.left, 
and  x. right,  possibly  including  x.left.f  and  x. right. f.  Then,  we  can  maintain  the 
values  of  /  in  all  nodes  of  T  during  insertion  and  deletion  without  asymptotically 
affecting  the  O (lg  n)  performance  of  these  operations. 

Proof  The  main  idea  of  the  proof  is  that  a  change  to  an  /  attribute  in  a  node  x 
propagates  only  to  ancestors  of  x  in  the  tree.  That  is,  changing  x.f  may  re- 
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quire  x.p.f  to  be  updated,  but  nothing  else;  updating  x.p.f  may  require  x.p.p.f 
to  be  updated,  but  nothing  else;  and  so  on  up  the  tree.  Once  we  have  updated 
T.root.f,  no  other  node  will  depend  on  the  new  value,  and  so  the  process  termi¬ 
nates.  Since  the  height  of  a  red-black  tree  is  0(lgn),  changing  an  /  attribute  in  a 
node  costs  0(lgn)  time  in  updating  all  nodes  that  depend  on  the  change. 

Insertion  of  a  node  x  into  T  consists  of  two  phases.  (See  Section  13.3.)  The 
first  phase  inserts  rasa  child  of  an  existing  node  x.p.  We  can  compute  the  value 
of  x./  in  0(  1)  time  since,  by  supposition,  it  depends  only  on  information  in  the 
other  attributes  of  x  itself  and  the  information  in  x ’s  children,  but  x  ’s  children  are 
both  the  sentinel  T.nil.  Once  we  have  computed  x.f,  the  change  propagates  up 
the  tree.  Thus,  the  total  time  for  the  first  phase  of  insertion  is  0(lgn).  During  the 
second  phase,  the  only  structural  changes  to  the  tree  come  from  rotations.  Since 
only  two  nodes  change  in  a  rotation,  the  total  time  for  updating  the  /  attributes 
is  0(lgn)  per  rotation.  Since  the  number  of  rotations  during  insertion  is  at  most 
two,  the  total  time  for  insertion  is  0(lg  n). 

Like  insertion,  deletion  has  two  phases.  (See  Section  13.4.)  In  the  first  phase, 
changes  to  the  tree  occur  when  the  deleted  node  is  removed  from  the  tree.  If  the 
deleted  node  had  two  children  at  the  time,  then  its  successor  moves  into  the  position 
of  the  deleted  node.  Propagating  the  updates  to  /  caused  by  these  changes  costs 
at  most  0( lg  n),  since  the  changes  modify  the  tree  locally.  Fixing  up  the  red-black 
tree  during  the  second  phase  requires  at  most  three  rotations,  and  each  rotation 
requires  at  most  0(\gn)  time  to  propagate  the  updates  to  /.  Thus,  like  insertion, 
the  total  time  for  deletion  is  0(lg  /?).  ■ 

In  many  cases,  such  as  maintaining  the  size  attributes  in  order-statistic  trees,  the 
cost  of  updating  after  a  rotation  is  0(1),  rather  than  the  0(lg  n )  derived  in  the  proof 
of  Theorem  14.1.  Exercise  14.2-3  gives  an  example. 

Exercises 


14.2-1 

Show,  by  adding  pointers  to  the  nodes,  how  to  support  each  of  the  dynamic-set 
queries  Minimum,  Maximum,  Successor,  and  Predecessor  in  0(1)  worst- 
case  time  on  an  augmented  order-statistic  tree.  The  asymptotic  performance  of 
other  operations  on  order-statistic  trees  should  not  be  affected. 


14.2-2 

Can  we  maintain  the  black-heights  of  nodes  in  a  red-black  tree  as  attributes  in  the 
nodes  of  the  tree  without  affecting  the  asymptotic  performance  of  any  of  the  red- 
black  tree  operations?  Show  how,  or  argue  why  not.  How  about  maintaining  the 
depths  of  nodes? 
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14.2- 3  * 

Let  <g>  be  an  associative  binary  operator,  and  let  a  be  an  attribute  maintained  in  each 
node  of  a  red-black  tree.  Suppose  that  we  want  to  include  in  each  node  x  an  addi¬ 
tional  attribute  /  such  that  x.f  —  X\.a  <g>  x2.a  <g>  •  •  •  <g>  xm.a,  where  x1;x2, . . . ,  xm 
is  the  inorder  listing  of  nodes  in  the  subtree  rooted  at  x.  Show  how  to  update  the  / 
attributes  in  0(1)  time  after  a  rotation.  Modify  your  argument  slightly  to  apply  it 
to  the  size  attributes  in  order-statistic  trees. 

14.2- 4  * 

We  wish  to  augment  red-black  trees  with  an  operation  RB -Enumerate^,  a,  b) 
that  outputs  all  the  keys  k  such  that  a  <  k  <  b  in  a  red-black  tree  rooted  at  x. 
Describe  how  to  implement  RB-Enumerate  in  &(m  +  Ig n)  time,  where  m  is  the 
number  of  keys  that  are  output  and  n  is  the  number  of  internal  nodes  in  the  tree. 
(Hint:  You  do  not  need  to  add  new  attributes  to  the  red-black  tree.) 


14.3  Interval  trees 

In  this  section,  we  shall  augment  red-black  trees  to  support  operations  on  dynamic 
sets  of  intervals.  A  closed  interval  is  an  ordered  pair  of  real  numbers  [l\ ,  t2\,  with 
t\  <  t2.  The  interval  [/, ,  t2]  represents  the  set  {t  S  M  :  t\  <  t  <  t2}.  Open  and 
half-open  intervals  omit  both  or  one  of  the  endpoints  from  the  set,  respectively.  In 
this  section,  we  shall  assume  that  intervals  are  closed;  extending  the  results  to  open 
and  half-open  intervals  is  conceptually  straightforward. 

Intervals  are  convenient  for  representing  events  that  each  occupy  a  continuous 
period  of  time.  We  might,  for  example,  wish  to  query  a  database  of  time  intervals 
to  find  out  what  events  occurred  during  a  given  interval.  The  data  structure  in  this 
section  provides  an  efficient  means  for  maintaining  such  an  interval  database. 

We  can  represent  an  interval  [t\,t2\  as  an  object  i,  with  attributes  i .low  =  t\ 
(the  low  endpoint )  and  i.high  =  t2  (the  high  endpoint).  We  say  that  intervals  i 
and  V  overlap  if  i  Hi'  f  0,  that  is,  if  i .low  <  V .high  and  i' .low  <  i.high.  As 
Figure  14.3  shows,  any  two  intervals  i  and  V  satisfy  the  interval  trichotomy,  that 
is,  exactly  one  of  the  following  three  properties  holds: 

a.  i  and  i'  overlap, 

b.  i  is  to  the  left  of  V  (i.e.,  i.high  <  i' .low), 

c.  i  is  to  the  right  of  V  (i.e.,  V .high  <  i.low). 

An  interval  tree  is  a  red-black  tree  that  maintains  a  dynamic  set  of  elements,  with 
each  element  x  containing  an  interval  x .  hit.  Interval  trees  support  the  following 
operations: 
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i' 


(a) 


>- 
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>- 


(b) 


(c) 


Figure  14.3  The  interval  trichotomy  for  two  closed  intervals  i  and  i' .  (a)  If  i  and  i'  overlap,  there 
are  four  situations;  in  each,  i.low  <  i' .high  and  i' .low  <  i.high.  (b)  The  intervals  do  not  overlap, 
and  i.high  <  i' .low.  (c)  The  intervals  do  not  overlap,  and  i' .high  <  i.low. 

Interval-Insert  (T,  x)  adds  the  element  x,  whose  int  attribute  is  assumed  to 
contain  an  interval,  to  the  interval  tree  T. 

Interval-Delete  (T,  x  )  removes  the  element  x  from  the  interval  tree  T . 

Interval-Search (T,  i )  returns  a  pointer  to  an  element  x  in  the  interval  tree  T 
such  that  x.int  overlaps  interval  i,  or  a  pointer  to  the  sentinel  T.nil  if  no  such 
element  is  in  the  set. 

Figure  14.4  shows  how  an  interval  tree  represents  a  set  of  intervals.  We  shall  track 
the  four-step  method  from  Section  14.2  as  we  review  the  design  of  an  interval  tree 
and  the  operations  that  run  on  it. 

Step  1:  Underlying  data  structure 

We  choose  a  red-black  tree  in  which  each  node  x  contains  an  interval  x .  int  and  the 
key  of  x  is  the  low  endpoint,  x.int. low,  of  the  interval.  Thus,  an  inorder  tree  walk 
of  the  data  structure  lists  the  intervals  in  sorted  order  by  low  endpoint. 

Step  2:  Additional  information 

In  addition  to  the  intervals  themselves,  each  node  x  contains  a  value  x .  max,  which 
is  the  maximum  value  of  any  interval  endpoint  stored  in  the  subtree  rooted  at  x. 

Step  3:  Maintaining  the  information 

We  must  verify  that  insertion  and  deletion  take  0(lg  /;)  time  on  an  interval  tree 
of  n  nodes.  We  can  determine  x.max  given  interval  x.int  and  the  max  values  of 
node  x’s  children: 
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(a) 

81—19 

61 - 110 

51 - 18 

01 - 13 

. 

0  5  10 


26H26 

251 - 130 

191—120 

171 - 119 

161 - 1 21 

151 - 123 


. > 

15  20  25  30 


Figure  14.4  An  interval  tree,  (a)  A  set  of  10  intervals,  shown  sorted  bottom  to  top  by  left  endpoint, 
(b)  The  interval  tree  that  represents  them.  Each  node  x  contains  an  interval,  shown  above  the  dashed 
line,  and  the  maximum  value  of  any  interval  endpoint  in  the  subtree  rooted  at  x,  shown  below  the 
dashed  line.  An  inorder  tree  walk  of  the  tree  lists  the  nodes  in  sorted  order  by  left  endpoint. 


x .  max  =  max  (x .  int.  high ,  x .  left. max,  x .  right .  max)  . 

Thus,  by  Theorem  14.1,  insertion  and  deletion  run  in  0(\gn)  time.  In  fact,  we 
can  update  the  max  attributes  after  a  rotation  in  0(1)  time,  as  Exercises  14.2-3 
and  14.3-1  show. 

Step  4:  Developing  new  operations 

The  only  new  operation  we  need  is  INTERVAL-SEARCH  (T,  i),  which  finds  a  node 
in  tree  T  whose  interval  overlaps  interval  i.  If  there  is  no  interval  that  overlaps  i  in 
the  tree,  the  procedure  returns  a  pointer  to  the  sentinel  T.nil. 
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Interval-Search  (T,  i ) 

1  x  =  T.root 

2  while  x  ^  T.nil  and  i  does  not  overlap  x.int 

3  if  x.left  T.nil  and  x. left. max  >  i.low 

4  x  =  x.left 

5  else  x  =  x. right 

6  return  x 

The  search  for  an  interval  that  overlaps  i  starts  with  x  at  the  root  of  the  tree  and 
proceeds  downward.  It  terminates  when  either  it  finds  an  overlapping  interval  or  x 
points  to  the  sentinel  T.nil.  Since  each  iteration  of  the  basic  loop  takes  0(1)  time, 
and  since  the  height  of  an  //-node  red-black  tree  is  0(lg  n),  the  Interval- Search 
procedure  takes  0(\gn)  time. 

Before  we  see  why  Interval-Search  is  correct,  let’s  examine  how  it  works 
on  the  interval  tree  in  Figure  14.4.  Suppose  we  wish  to  find  an  interval  that  overlaps 
the  interval  i  =  [22, 25].  We  begin  with  x  as  the  root,  which  contains  [16, 21]  and 
does  not  overlap  i.  Since  x.left. max  =  23  is  greater  than  i.low  =  22,  the  loop 
continues  with  x  as  the  left  child  of  the  root— the  node  containing  [8,  9],  which  also 
does  not  overlap  i .  This  time,  x.left. max  =  10  is  less  than  i.low  =  22,  and  so  the 
loop  continues  with  the  right  child  of  x  as  the  new  x.  Because  the  interval  [15, 23] 
stored  in  this  node  overlaps  i,  the  procedure  returns  this  node. 

As  an  example  of  an  unsuccessful  search,  suppose  we  wish  to  find  an  interval 
that  overlaps  i  =  [11, 14]  in  the  interval  tree  of  Figure  14.4.  We  once  again  be¬ 
gin  with  x  as  the  root.  Since  the  root’s  interval  [16,21]  does  not  overlap  i,  and 
since  x.left. max  =  23  is  greater  than  i.low  =  11,  we  go  left  to  the  node  con¬ 
taining  [8,  9].  Interval  [8,  9]  does  not  overlap  i,  and  x.left. max  =  10  is  less  than 
i.low  =11,  and  so  we  go  right.  (Note  that  no  interval  in  the  left  subtree  over¬ 
laps  i.)  Interval  [15, 23]  does  not  overlap  i,  and  its  left  child  is  T.nil,  so  again  we 
go  right,  the  loop  terminates,  and  we  return  the  sentinel  T.nil. 

To  see  why  Interval-Search  is  correct,  we  must  understand  why  it  suffices 
to  examine  a  single  path  from  the  root.  The  basic  idea  is  that  at  any  node  x, 
if  x.int  does  not  overlap  i,  the  search  always  proceeds  in  a  safe  direction:  the 
search  will  definitely  find  an  overlapping  interval  if  the  tree  contains  one.  The 
following  theorem  states  this  property  more  precisely. 

Theorem  14.2 

Any  execution  of  Interval-Search (T,  i)  either  returns  a  node  whose  interval 
overlaps  i ,  or  it  returns  T.  nil  and  the  tree  T  contains  no  node  whose  interval  over¬ 
laps  i . 
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Figure  14.5  Intervals  in  the  proof  of  Theorem  14.2.  The  value  of  x.  left,  max  is  shown  in  each  case 
as  a  dashed  line,  (a)  The  search  goes  right.  No  interval  i'  in  x’s  left  subtree  can  overlap  i.  (b)  The 
search  goes  left.  The  left  subtree  of  x  contains  an  interval  that  overlaps  i  (situation  not  shown), 
or  x’s  left  subtree  contains  an  interval  i'  such  that  i' .high  =  x. left. max.  Since  i  does  not  overlap  i' , 
neither  does  it  overlap  any  interval  i"  in  jc’s  right  subtree,  since  V .low  <  i" .low. 

Proof  The  while  loop  of  lines  2-5  terminates  either  when  x  =  T.  nil  or  i  over¬ 
laps  x.int.  In  the  latter  case,  it  is  certainly  correct  to  return  x.  Therefore,  we  focus 
on  the  former  case,  in  which  the  while  loop  terminates  because  x  =  T.  nil. 

We  use  the  following  invariant  for  the  while  loop  of  lines  2-5: 

If  tree  T  contains  an  interval  that  overlaps  i,  then  the  subtree  rooted  at  x 
contains  such  an  interval. 

We  use  this  loop  invariant  as  follows: 

Initialization:  Prior  to  the  first  iteration,  line  1  sets  x  to  be  the  root  of  T,  so  that 
the  invariant  holds. 

Maintenance:  Each  iteration  of  the  while  loop  executes  either  line  4  or  line  5.  We 
shall  show  that  both  cases  maintain  the  loop  invariant. 

If  line  5  is  executed,  then  because  of  the  branch  condition  in  line  3,  we 
have  x.left  =  T.nil,  or  x. left. max  <  i.low.  If  x.left  =  T.nil,  the  subtree 
rooted  at  x.left  clearly  contains  no  interval  that  overlaps  i,  and  so  setting  x 
to  x. right  maintains  the  invariant.  Suppose,  therefore,  that  x.left  ^  T.nil  and 
x.left. max  <  i.low.  As  Figure  14.5(a)  shows,  for  each  interval  V  in  x’s  left 
subtree,  we  have 

i ' .  high  <  x.left.  max 
<  i.low  . 

By  the  interval  trichotomy,  therefore,  V  and  i  do  not  overlap.  Thus,  the  left 
subtree  of  x  contains  no  intervals  that  overlap  i,  so  that  setting  x  to  x. right 
maintains  the  invariant. 
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If,  on  the  other  hand,  line  4  is  executed,  then  we  will  show  that  the  contrapos¬ 
itive  of  the  loop  invariant  holds.  That  is,  if  the  subtree  rooted  at  x.left  con¬ 
tains  no  interval  overlapping  i ,  then  no  interval  anywhere  in  the  tree  overlaps  i . 
Since  line  4  is  executed,  then  because  of  the  branch  condition  in  line  3,  we 
have  x.left. max  >  i.low.  Moreover,  by  definition  of  the  max  attribute,  x’s  left 
subtree  must  contain  some  interval  V  such  that 

i'  .high  =  x.left. max 
>  i.low  . 

(Figure  14.5(b)  illustrates  the  situation.)  Since  i  and  i'  do  not  overlap,  and 
since  it  is  not  true  that  V .high  <  i.low,  it  follows  by  the  interval  trichotomy 
that  i.high  <  i' .low.  Interval  trees  are  keyed  on  the  low  endpoints  of  intervals, 
and  thus  the  search-tree  property  implies  that  for  any  interval  i"  in  x’s  right 
subtree, 

i.high  <  i' .low 
<  i" .low  . 

By  the  interval  trichotomy,  i  and  i"  do  not  overlap.  We  conclude  that  whether 
or  not  any  interval  in  x ’s  left  subtree  overlaps  i ,  setting  x  to  x .  left  maintains 
the  invariant. 

Termination:  If  the  loop  terminates  when  x  =  T.nil,  then  the  subtree  rooted  at  x 
contains  no  interval  overlapping  i.  The  contrapositive  of  the  loop  invariant 
implies  that  T  contains  no  interval  that  overlaps  i .  Hence  it  is  correct  to  return 
x  =  T.nil.  m 

Thus,  the  Interval-Search  procedure  works  correctly. 

Exercises 


14.3-1 

Write  pseudocode  for  Left- Rotate  that  operates  on  nodes  in  an  interval  tree  and 
updates  the  max  attributes  in  0(1)  time. 


14.3-2 

Rewrite  the  code  for  Interval-Search  so  that  it  works  properly  when  all  inter¬ 
vals  are  open. 


14.3-3 

Describe  an  efficient  algorithm  that,  given  an  interval  i ,  returns  an  interval  over¬ 
lapping  i  that  has  the  minimum  low  endpoint,  or  T.nil  if  no  such  interval  exists. 
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14.3-4 

Given  an  interval  tree  T  and  an  interval  i ,  describe  how  to  list  all  intervals  in  T 
that  overlap  i  in  0(min(/7,  k  lg  n))  time,  where  k  is  the  number  of  intervals  in  the 
output  list.  {Hint:  One  simple  method  makes  several  queries,  modifying  the  tree 
between  queries.  A  slightly  more  complicated  method  does  not  modify  the  tree.) 


14.3- 5 

Suggest  modifications  to  the  interval-tree  procedures  to  support  the  new  opera¬ 
tion  Interval-Search-Exactly {T.  /),  where  T  is  an  interval  tree  and  i  is 
an  interval.  The  operation  should  return  a  pointer  to  a  node  x  in  T  such  that 
x.int.low  =  i.low  and  x.int.high  =  i.high,  or  T. nil  if  T  contains  no  such  node. 
All  operations,  including  Interval-Search-Exactly,  should  run  in  0(lgn) 
time  on  an  /(-node  interval  tree. 

14.3- 6 

Show  how  to  maintain  a  dynamic  set  Q  of  numbers  that  supports  the  operation 
Min-Gap,  which  gives  the  magnitude  of  the  difference  of  the  two  closest  num¬ 
bers  in  Q.  For  example,  if  Q  =  {1,5,9, 15, 18,22},  then  MlN-GAP(g)  returns 
18—15  =  3,  since  15  and  18  are  the  two  closest  numbers  in  Q.  Make  the  op¬ 
erations  Insert,  Delete,  Search,  and  Min-Gap  as  efficient  as  possible,  and 
analyze  their  running  times. 

14.3- 7  * 

VLSI  databases  commonly  represent  an  integrated  circuit  as  a  list  of  rectan¬ 
gles.  Assume  that  each  rectangle  is  rectilinearly  oriented  (sides  parallel  to  the 
x-  and  y-axes),  so  that  we  represent  a  rectangle  by  its  minimum  and  maximum  x- 
and  y -coordinates.  Give  an  0(n  lg  «)-time  algorithm  to  decide  whether  or  not  a  set 
of  n  rectangles  so  represented  contains  two  rectangles  that  overlap.  Your  algorithm 
need  not  report  all  intersecting  pairs,  but  it  must  report  that  an  overlap  exists  if  one 
rectangle  entirely  covers  another,  even  if  the  boundary  lines  do  not  intersect.  {Hint: 
Move  a  “sweep”  line  across  the  set  of  rectangles.) 


Problems 


14-1  Point  of  maximum  overlap 

Suppose  that  we  wish  to  keep  track  of  a  point  of  maximum  overlap  in  a  set  of 
intervals— a  point  with  the  largest  number  of  intervals  in  the  set  that  overlap  it. 

a.  Show  that  there  will  always  be  a  point  of  maximum  overlap  that  is  an  endpoint 
of  one  of  the  segments. 
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b.  Design  a  data  structure  that  efficiently  supports  the  operations  Interval- 
Insert,  Interval-Delete,  and  Find-POM,  which  returns  a  point  of  max¬ 
imum  overlap.  {Hint:  Keep  a  red-black  tree  of  all  the  endpoints.  Associate 
a  value  of  +1  with  each  left  endpoint,  and  associate  a  value  of  —1  with  each 
right  endpoint.  Augment  each  node  of  the  tree  with  some  extra  information  to 
maintain  the  point  of  maximum  overlap.) 

14-2  Josephus  permutation 

We  define  the  Josephus  problem  as  follows.  Suppose  that  n  people  form  a  circle 
and  that  we  are  given  a  positive  integer  m  <  n.  Beginning  with  a  designated 
first  person,  we  proceed  around  the  circle,  removing  every  mth  person.  After  each 
person  is  removed,  counting  continues  around  the  circle  that  remains.  This  process 
continues  until  we  have  removed  all  n  people.  The  order  in  which  the  people  are 
removed  from  the  circle  defines  the  ( n,m)-Josephus  permutation  of  the  integers 
1,2 For  example,  the  (7,  3)-Josephus  permutation  is  (3,  6, 2,  7,  5,  1,4). 

a.  Suppose  that  m  is  a  constant.  Describe  an  0(«)-time  algorithm  that,  given  an 
integer  n,  outputs  the  (« .  m) -Josephus  permutation. 

b.  Suppose  that  m  is  not  a  constant.  Describe  an  0(n  Ig  /;)-time  algorithm  that, 
given  integers  n  and  m,  outputs  the  (/? ,  m) -Josephus  permutation. 


Chapter  notes 

In  their  book,  Preparata  and  Shamos  [282]  describe  several  of  the  interval  trees 
that  appear  in  the  literature,  citing  work  by  H.  Edelsbrunner  (1980)  and  E.  M. 
McCreight  (1981).  The  book  details  an  interval  tree  that,  given  a  static  database 
of  n  intervals,  allows  us  to  enumerate  all  k  intervals  that  overlap  a  given  query 
interval  in  0(k  +  Ig  n)  time. 
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Introduction 


This  part  covers  three  important  techniques  used  in  designing  and  analyzing  effi¬ 
cient  algorithms:  dynamic  programming  (Chapter  15),  greedy  algorithms  (Chap¬ 
ter  16),  and  amortized  analysis  (Chapter  17).  Earlier  parts  have  presented  other 
widely  applicable  techniques,  such  as  divide-and-conquer,  randomization,  and  how 
to  solve  recurrences.  The  techniques  in  this  part  are  somewhat  more  sophisticated, 
but  they  help  us  to  attack  many  computational  problems.  The  themes  introduced  in 
this  part  will  recur  later  in  this  book. 

Dynamic  programming  typically  applies  to  optimization  problems  in  which  we 
make  a  set  of  choices  in  order  to  arrive  at  an  optimal  solution.  As  we  make 
each  choice,  subproblems  of  the  same  form  often  arise.  Dynamic  programming 
is  effective  when  a  given  subproblem  may  arise  from  more  than  one  partial  set  of 
choices;  the  key  technique  is  to  store  the  solution  to  each  such  subproblem  in  case  it 
should  reappear.  Chapter  15  shows  how  this  simple  idea  can  sometimes  transform 
exponential-time  algorithms  into  polynomial-time  algorithms. 

Like  dynamic-programming  algorithms,  greedy  algorithms  typically  apply  to 
optimization  problems  in  which  we  make  a  set  of  choices  in  order  to  arrive  at  an 
optimal  solution.  The  idea  of  a  greedy  algorithm  is  to  make  each  choice  in  a  locally 
optimal  manner.  A  simple  example  is  coin-changing:  to  minimize  the  number  of 
U.S.  coins  needed  to  make  change  for  a  given  amount,  we  can  repeatedly  select 
the  largest-denomination  coin  that  is  not  larger  than  the  amount  that  remains.  A 
greedy  approach  provides  an  optimal  solution  for  many  such  problems  much  more 
quickly  than  would  a  dynamic-programming  approach.  We  cannot  always  easily 
tell  whether  a  greedy  approach  will  be  effective,  however.  Chapter  16  introduces 
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matroid  theory,  which  provides  a  mathematical  basis  that  can  help  us  to  show  that 
a  greedy  algorithm  yields  an  optimal  solution. 

We  use  amortized  analysis  to  analyze  certain  algorithms  that  perform  a  sequence 
of  similar  operations.  Instead  of  bounding  the  cost  of  the  sequence  of  operations 
by  bounding  the  actual  cost  of  each  operation  separately,  an  amortized  analysis 
provides  a  bound  on  the  actual  cost  of  the  entire  sequence.  One  advantage  of  this 
approach  is  that  although  some  operations  might  be  expensive,  many  others  might 
be  cheap.  In  other  words,  many  of  the  operations  might  run  in  well  under  the  worst- 
case  time.  Amortized  analysis  is  not  just  an  analysis  tool,  however;  it  is  also  a  way 
of  thinking  about  the  design  of  algorithms,  since  the  design  of  an  algorithm  and  the 
analysis  of  its  running  time  are  often  closely  intertwined.  Chapter  17  introduces 
three  ways  to  perform  an  amortized  analysis  of  an  algorithm. 


15 


Dynamic  Programming 


Dynamic  programming,  like  the  divide-and-conquer  method,  solves  problems  by 
combining  the  solutions  to  subproblems.  (“Programming”  in  this  context  refers 
to  a  tabular  method,  not  to  writing  computer  code.)  As  we  saw  in  Chapters  2 
and  4,  divide-and-conquer  algorithms  partition  the  problem  into  disjoint  subprob¬ 
lems,  solve  the  subproblems  recursively,  and  then  combine  their  solutions  to  solve 
the  original  problem.  In  contrast,  dynamic  programming  applies  when  the  subprob¬ 
lems  overlap— that  is,  when  subproblems  share  subsubproblems.  In  this  context, 
a  divide-and-conquer  algorithm  does  more  work  than  necessary,  repeatedly  solv¬ 
ing  the  common  subsubproblems.  A  dynamic -programming  algorithm  solves  each 
subsubproblem  just  once  and  then  saves  its  answer  in  a  table,  thereby  avoiding  the 
work  of  recomputing  the  answer  every  time  it  solves  each  subsubproblem. 

We  typically  apply  dynamic  programming  to  optimization  problems.  Such  prob¬ 
lems  can  have  many  possible  solutions.  Each  solution  has  a  value,  and  we  wish  to 
find  a  solution  with  the  optimal  (minimum  or  maximum)  value.  We  call  such  a 
solution  an  optimal  solution  to  the  problem,  as  opposed  to  the  optimal  solution, 
since  there  may  be  several  solutions  that  achieve  the  optimal  value. 

When  developing  a  dynamic-programming  algorithm,  we  follow  a  sequence  of 
four  steps: 

1 .  Characterize  the  structure  of  an  optimal  solution. 

2.  Recursively  define  the  value  of  an  optimal  solution. 

3.  Compute  the  value  of  an  optimal  solution,  typically  in  a  bottom-up  fashion. 

4.  Construct  an  optimal  solution  from  computed  information. 

Steps  1-3  form  the  basis  of  a  dynamic -programming  solution  to  a  problem.  If  we 
need  only  the  value  of  an  optimal  solution,  and  not  the  solution  itself,  then  we 
can  omit  step  4.  When  we  do  perform  step  4,  we  sometimes  maintain  additional 
information  during  step  3  so  that  we  can  easily  construct  an  optimal  solution. 

The  sections  that  follow  use  the  dynamic -programming  method  to  solve  some 
optimization  problems.  Section  15.1  examines  the  problem  of  cutting  a  rod  into 
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rods  of  smaller  length  in  way  that  maximizes  their  total  value.  Section  15.2  asks 
how  we  can  multiply  a  chain  of  matrices  while  performing  the  fewest  total  scalar 
multiplications.  Given  these  examples  of  dynamic  programming,  Section  15.3  dis¬ 
cusses  two  key  characteristics  that  a  problem  must  have  for  dynamic  programming 
to  be  a  viable  solution  technique.  Section  15.4  then  shows  how  to  find  the  longest 
common  subsequence  of  two  sequences  via  dynamic  programming.  Finally,  Sec¬ 
tion  15.5  uses  dynamic  programming  to  construct  binary  search  trees  that  are  opti¬ 
mal,  given  a  known  distribution  of  keys  to  be  looked  up. 


15.1  Rod  cutting 

Our  first  example  uses  dynamic  programming  to  solve  a  simple  problem  in  decid¬ 
ing  where  to  cut  steel  rods.  Serling  Enterprises  buys  long  steel  rods  and  cuts  them 
into  shorter  rods,  which  it  then  sells.  Each  cut  is  free.  The  management  of  Serling 
Enterprises  wants  to  know  the  best  way  to  cut  up  the  rods. 

We  assume  that  we  know,  for  i  =  1,2,...,  the  price  p,  in  dollars  that  Serling 
Enterprises  charges  for  a  rod  of  length  i  inches.  Rod  lengths  are  always  an  integral 
number  of  inches.  Figure  15.1  gives  a  sample  price  table. 

The  rod-cutting  problem  is  the  following.  Given  a  rod  of  length  n  inches  and  a 
table  of  prices  pt  for  i  =  1,2 determine  the  maximum  revenue  rn  obtain¬ 
able  by  cutting  up  the  rod  and  selling  the  pieces.  Note  that  if  the  price  pn  for  a  rod 
of  length  n  is  large  enough,  an  optimal  solution  may  require  no  cutting  at  all. 

Consider  the  case  when  n  =  4.  Figure  15.2  shows  all  the  ways  to  cut  up  a  rod 
of  4  inches  in  length,  including  the  way  with  no  cuts  at  all.  We  see  that  cutting  a 
4-inch  rod  into  two  2-inch  pieces  produces  revenue  p2  +  P2  =  5  +  5  =  10,  which 
is  optimal. 

We  can  cut  up  a  rod  of  length  n  in  2"_1  different  ways,  since  we  have  an  in¬ 
dependent  option  of  cutting,  or  not  cutting,  at  distance  i  inches  from  the  left  end, 


length  i 

12  3  4 

5 

6 

7 

8 

9 

10 

price  pi 

15  8  9 

10 

17 

17 

20 

24 

30 

Figure  15.1  A  sample  price  table  for  rods.  Each  rod  of  length  i  inches  earns  the  company  p, 
dollars  of  revenue. 
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Figure  15.2  The  8  possible  ways  of  cutting  up  a  rod  of  length  4.  Above  each  piece  is  the 
value  of  that  piece,  according  to  the  sample  price  chart  of  Figure  15.1.  The  optimal  strategy  is 
part  (c)  cutting  the  rod  into  two  pieces  of  length  2  which  has  total  value  10. 


for  i  —  1,2, ...  ,7i  —  l.1 *  We  denote  a  decomposition  into  pieces  using  ordinary 
additive  notation,  so  that  7  =  2  +  2  +  3  indicates  that  a  rod  of  length  7  is  cut  into 
three  pieces— two  of  length  2  and  one  of  length  3.  If  an  optimal  solution  cuts  the 
rod  into  k  pieces,  for  some  1  <  k  <  n,  then  an  optimal  decomposition 

7J  =  z'i  +  1*2  +  +  4 

of  the  rod  into  pieces  of  lengths  i  i,  /2,  ...,  4  provides  maximum  corresponding 
revenue 

rn  =  Pn  +  Pi2+-"  +  Pik  ■ 

For  our  sample  problem,  vve  can  determine  the  optimal  revenue  figures  rjt  for 
i  —  1,2, . . . ,  10,  by  inspection,  with  the  corresponding  optimal  decompositions 


1  If  we  required  the  pieces  to  be  cut  in  order  of  nondecreasing  size,  there  would  be  fewer  ways 
to  consider.  For  n  =  4,  we  would  consider  only  5  such  ways:  parts  (a),  (b),  (c),  (e),  and  (h) 

in  Figure  15.2.  The  number  of  ways  is  called  the  partition  function ;  it  is  approximately  equal  to 

eK  v/2”/3/4«  This  quantity  is  less  than  2n_1,  but  still  much  greater  than  any  polynomial  in  n. 
We  shall  not  pursue  this  line  of  inquiry  further,  however. 
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N 

=  1 

from  solution 

r2 

=  5 

from  solution 

N 

=  8 

from  solution 

U 

=  10 

from  solution 

r5 

=  13 

from  solution 

r6 

=  17 

from  solution 

r-i 

=  18 

from  solution 

r« 

=  22 

from  solution 

r9 

=  25 

from  solution 

No 

=  30 

from  solution 

More  generally,  we  can 

enues  from  shorter  rods: 


1  =  1  (no  cuts)  , 

2  =  2  (no  cuts)  , 

3  =  3  (no  cuts)  , 

4  =  2  +  2, 

5  =  2  +  3, 

6  =  6  (no  cuts)  , 
7=1+6  or  7  =  2  +  2  + 
8  =  2  +  6, 

9  =  3  +  6, 

10  =  10  (no  cuts)  . 

frame  the  values  rn  for  n  > 


3  , 


1  in  terms  of  optimal  rev- 


rn  =  max(p„,r!  +  rn-Ur2  +  r„_2, . . . ,  r„_j  +  rx)  .  (15.1) 

The  first  argument,  pn ,  corresponds  to  making  no  cuts  at  all  and  selling  the  rod  of 
length  n  as  is.  The  other  n  —  1  arguments  to  max  correspond  to  the  maximum  rev¬ 
enue  obtained  by  making  an  initial  cut  of  the  rod  into  two  pieces  of  size  i  and  n  —  i, 
for  each  i  =  1, 2, ...,«  —  1,  and  then  optimally  cutting  up  those  pieces  further, 
obtaining  revenues  r,  and  r„_,  from  those  two  pieces.  Since  we  don’t  know  ahead 
of  time  which  value  of  i  optimizes  revenue,  we  have  to  consider  all  possible  values 
for  i  and  pick  the  one  that  maximizes  revenue.  We  also  have  the  option  of  picking 
no  i  at  all  if  we  can  obtain  more  revenue  by  selling  the  rod  uncut. 

Note  that  to  solve  the  original  problem  of  size  n ,  we  solve  smaller  problems  of 
the  same  type,  but  of  smaller  sizes.  Once  we  make  the  first  cut,  we  may  consider 
the  two  pieces  as  independent  instances  of  the  rod-cutting  problem.  The  overall 
optimal  solution  incorporates  optimal  solutions  to  the  two  related  subproblems, 
maximizing  revenue  from  each  of  those  two  pieces.  We  say  that  the  rod-cutting 
problem  exhibits  optimal  substructure :  optimal  solutions  to  a  problem  incorporate 
optimal  solutions  to  related  subproblems,  which  we  may  solve  independently. 

In  a  related,  but  slightly  simpler,  way  to  arrange  a  recursive  structure  for  the  rod¬ 
cutting  problem,  we  view  a  decomposition  as  consisting  of  a  first  piece  of  length  i 
cut  off  the  left-hand  end,  and  then  a  right-hand  remainder  of  length  n  —  i .  Only 
the  remainder,  and  not  the  first  piece,  may  be  further  divided.  We  may  view  every 
decomposition  of  a  length-//  rod  in  this  way:  as  a  first  piece  followed  by  some 
decomposition  of  the  remainder.  When  doing  so,  we  can  couch  the  solution  with 
no  cuts  at  all  as  saying  that  the  first  piece  has  size  i  =  n  and  revenue  p„  and  that 
the  remainder  has  size  0  with  corresponding  revenue  r0  =  0.  We  thus  obtain  the 
following  simpler  version  of  equation  (15.1): 

=  max  (pi  +  /•„_,)  . 

\<i  <n 


r, 


(15.2) 
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In  this  formulation,  an  optimal  solution  embodies  the  solution  to  only  one  related 
subproblem— the  remainder— rather  than  two. 

Recursive  top-down  implementation 

The  following  procedure  implements  the  computation  implicit  in  equation  (15.2) 
in  a  straightforward,  top-down,  recursive  manner. 

Cut-Rod(p,7?) 

1  if  1 1  ==  0 

2  return  0 

3  q  =  — oo 

4  for  i  =  1  to  n 

5  q  =  ma x(q,  p[i]  +  Cut-Rod(/7,  77  —  /)) 

6  return  q 

Procedure  Cut-Rod  takes  as  input  an  array  p[  1  . .  //]  of  prices  and  an  integer  n, 
and  it  returns  the  maximum  revenue  possible  for  a  rod  of  length  n  .If  77  =  0,  no 
revenue  is  possible,  and  so  Cut-Rod  returns  0  in  line  2.  Line  3  initializes  the 
maximum  revenue  q  to  —  oo,  so  that  the  for  loop  in  lines  4-5  correctly  computes 
q  =  maxi<;<„(/?;  +  CUT-ROD(p,  n  —  i));  line  6  then  returns  this  value.  A  simple 
induction  on  n  proves  that  this  answer  is  equal  to  the  desired  answer  r„,  using 
equation  (15.2). 

If  you  were  to  code  up  Cut-Rod  in  your  favorite  programming  language  and  run 
it  on  your  computer,  you  would  find  that  once  the  input  size  becomes  moderately 
large,  your  program  would  take  a  long  time  to  run.  For  n  =  40,  you  would  find  that 
your  program  takes  at  least  several  minutes,  and  most  likely  more  than  an  hour.  In 
fact,  you  would  find  that  each  time  you  increase  77  by  1 ,  your  program’s  running 
time  would  approximately  double. 

Why  is  Cut-Rod  so  inefficient?  The  problem  is  that  Cut-Rod  calls  itself 
recursively  over  and  over  again  with  the  same  parameter  values;  it  solves  the 
same  subproblems  repeatedly.  Figure  15.3  illustrates  what  happens  for  n  =  4: 
CUT-ROD(p, 77)  calls  CUT-ROD(p,77  —  i)  for  i  =  1,2,..., 77.  Equivalently, 
CUT-ROD(p, 77)  calls  Cut-Rod(/7,  j)  for  each  j  =  0,  1, ...  ,77  —  1.  When  this 
process  unfolds  recursively,  the  amount  of  work  done,  as  a  function  of  77,  grows 
explosively. 

To  analyze  the  running  time  of  Cut-Rod,  let  T ( n )  denote  the  total  number  of 
calls  made  to  Cut- Rod  when  called  with  its  second  parameter  equal  to  n.  This 
expression  equals  the  number  of  nodes  in  a  subtree  whose  root  is  labeled  n  in  the 
recursion  tree.  The  count  includes  the  initial  call  at  its  root.  Thus,  T(0)  =  1  and 
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Figure  15  J  The  recursion  tree  showing  recursive  calls  resulting  from  a  call  CUT  ROD(/>,n)  for 
n  =  4.  Each  node  label  gives  the  size  n  of  the  corresponding  subproblem,  so  that  an  edge  from 
a  parent  with  label  s  to  a  child  with  label  t  corresponds  to  cutting  off  an  initial  piece  of  size  s  —  t 
and  leaving  a  remaining  subproblem  of  size  t.  A  path  from  the  root  to  a  leaf  corresponds  to  one  of 
the  2”_1  ways  of  cutting  up  a  rod  of  length  n.  In  general,  this  recursion  tree  has  2”  nodes  and  2n_1 
leaves. 


7»  =  l  +  ^TO).  (15.3) 

7=0 

The  initial  1  is  for  the  call  at  the  root,  and  the  term  T ( j  )  counts  the  number  of  calls 
(including  recursive  calls)  due  to  the  call  Cut-Rod (p,n  —  i),  where  j  =  n  —  i. 
As  Exercise  15.1-1  asks  you  to  show, 

T(n)  =  2"  ,  (15.4) 

and  so  the  running  time  of  Cut-Rod  is  exponential  in  n. 

In  retrospect,  this  exponential  running  time  is  not  so  surprising.  Cut-Rod  ex¬ 
plicitly  considers  all  the  2"_1  possible  ways  of  cutting  up  a  rod  of  length  n.  The 
tree  of  recursive  calls  has  2n_1  leaves,  one  for  each  possible  way  of  cutting  up  the 
rod.  The  labels  on  the  simple  path  from  the  root  to  a  leaf  give  the  sizes  of  each 
remaining  right-hand  piece  before  making  each  cut.  That  is,  the  labels  give  the 
corresponding  cut  points,  measured  from  the  right-hand  end  of  the  rod. 

Using  dynamic  programming  for  optimal  rod  cutting 

We  now  show  how  to  convert  CUT-ROD  into  an  efficient  algorithm,  using  dynamic 
programming. 

The  dynamic-programming  method  works  as  follows.  Having  observed  that  a 
naive  recursive  solution  is  inefficient  because  it  solves  the  same  subproblems  re¬ 
peatedly,  we  arrange  for  each  subproblem  to  be  solved  only  once,  saving  its  solu¬ 
tion.  If  we  need  to  refer  to  this  subproblem’s  solution  again  later,  we  can  just  look  it 
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up,  rather  than  recompute  it.  Dynamic  programming  thus  uses  additional  memory 
to  save  computation  time;  it  serves  an  example  of  a  time-memory  trade-off.  The 
savings  may  be  dramatic:  an  exponential-time  solution  may  be  transformed  into  a 
polynomial-time  solution.  A  dynamic -programming  approach  runs  in  polynomial 
time  when  the  number  of  distinct  subproblems  involved  is  polynomial  in  the  input 
size  and  we  can  solve  each  such  subproblem  in  polynomial  time. 

There  are  usually  two  equivalent  ways  to  implement  a  dynamic -programming 
approach.  We  shall  illustrate  both  of  them  with  our  rod-cutting  example. 

The  first  approach  is  top-down  with  memoization.2  In  this  approach,  we  write 
the  procedure  recursively  in  a  natural  manner,  but  modified  to  save  the  result  of 
each  subproblem  (usually  in  an  array  or  hash  table).  The  procedure  now  first  checks 
to  see  whether  it  has  previously  solved  this  subproblem.  If  so,  it  returns  the  saved 
value,  saving  further  computation  at  this  level;  if  not,  the  procedure  computes  the 
value  in  the  usual  manner.  We  say  that  the  recursive  procedure  has  been  memoized', 
it  “remembers”  what  results  it  has  computed  previously. 

The  second  approach  is  the  bottom-up  method.  This  approach  typically  depends 
on  some  natural  notion  of  the  “size”  of  a  subproblem,  such  that  solving  any  par¬ 
ticular  subproblem  depends  only  on  solving  “smaller”  subproblems.  We  sort  the 
subproblems  by  size  and  solve  them  in  size  order,  smallest  first.  When  solving  a 
particular  subproblem,  we  have  already  solved  all  of  the  smaller  subproblems  its 
solution  depends  upon,  and  we  have  saved  their  solutions.  We  solve  each  sub¬ 
problem  only  once,  and  when  we  first  see  it,  we  have  already  solved  all  of  its 
prerequisite  subproblems. 

These  two  approaches  yield  algorithms  with  the  same  asymptotic  running  time, 
except  in  unusual  circumstances  where  the  top-down  approach  does  not  actually 
recurse  to  examine  all  possible  subproblems.  The  bottom-up  approach  often  has 
much  better  constant  factors,  since  it  has  less  overhead  for  procedure  calls. 

Here  is  the  the  pseudocode  for  the  top-down  Cut-Rod  procedure,  with  memo¬ 
ization  added: 

Memoized-Cut-Rod  (p,  n) 

1  let  r[0  . .  /?]  be  a  new  array 

2  for  i  =  0  to  n 

3  r[i]  =  —  oo 

4  return  Memoized-Cut-Rod-Aux(/>,/7,  r) 


2This  is  not  a  misspelling.  The  word  really  is  memoization,  not  memorization.  Memoization  comes 
from  memo,  since  the  technique  consists  of  recording  a  value  so  that  we  can  look  it  up  later. 
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Memoized-Cut-Rod-Aux  (p ,  n,r) 

1  if  r  [/?]  >  0 

2  return  r[n] 

3  if  n  ==  0 

4  q  =  0 

5  else  q  —  — oo 

6  for  i  =  1  to  n 

7  q  =  max (#,/?[/]  +  Memoized-Cut-Rod-Aux (p,  n  -  i,r )) 

8  r  [/?]  =  q 

9  return  q 

Here,  the  main  procedure  Memoized-Cut-Rod  initializes  a  new  auxiliary  ar¬ 
ray  r[0 . .  n\  with  the  value  — oo,  a  convenient  choice  with  which  to  denote  “un¬ 
known.”  (Known  revenue  values  are  always  nonnegative.)  It  then  calls  its  helper 
routine,  Memoized-Cut-Rod-Aux. 

The  procedure  Memoized-Cut-Rod-Aux  is  just  the  memoized  version  of  our 
previous  procedure,  Cut-Rod.  It  first  checks  in  line  1  to  see  whether  the  desired 
value  is  already  known  and,  if  it  is,  then  line  2  returns  it.  Otherwise,  lines  3-7 
compute  the  desired  value  q  in  the  usual  manner,  line  8  saves  it  in  r[n\,  and  line  9 
returns  it. 

The  bottom-up  version  is  even  simpler: 

BOTTOM-UP-CUT-ROD(p,n) 

1  let  r  [0 . .  n ]  be  a  new  array 

2  r[  0]  =  0 

3  for  j  =  1  to  n 

4  q  —  — oo 

5  for  i  =  1  to  j 

6  q  =  max(q,p[i]  +  r[j  - /]) 

7  r  [j  ]  =  q 

8  return  r[n] 

For  the  bottom-up  dynamic -programming  approach,  Bottom-Up-Cut-Rod 
uses  the  natural  ordering  of  the  subproblems:  a  problem  of  size  i  is  “smaller” 
than  a  subproblem  of  size  j  if  i  <  j .  Thus,  the  procedure  solves  subproblems  of 
sizes  j  =  0,  1, ...,/?,  in  that  order. 

Line  1  of  procedure  Bottom-Up-Cut-Rod  creates  a  new  array  r[0..n]  in 
which  to  save  the  results  of  the  subproblems,  and  line  2  initializes  r  [0]  to  0,  since 
a  rod  of  length  0  earns  no  revenue.  Lines  3-6  solve  each  subproblem  of  size  j ,  for 
j  =  1, 2, ...,/?,  in  order  of  increasing  size.  The  approach  used  to  solve  a  problem 
of  a  particular  size  j  is  the  same  as  that  used  by  Cut-Rod,  except  that  line  6  now 


15.1  Rod  cutting 


367 


Figure  15.4  The  subproblem  graph  for  the  rod  cutting  problem  with  n  =  4.  The  vertex  labels 
give  the  sizes  of  the  corresponding  subproblems.  A  directed  edge  (x ,  y )  indicates  that  we  need  a 
solution  to  subproblem  y  when  solving  subproblem  x.  This  graph  is  a  reduced  version  of  the  tree  of 
Figure  15.3,  in  which  all  nodes  with  the  same  label  are  collapsed  into  a  single  vertex  and  all  edges 
go  from  parent  to  child. 

directly  references  array  entry  r  [j  —  i  ]  instead  of  making  a  recursive  call  to  solve 
the  subproblem  of  size  j  —  i.  Line  7  saves  in  r[y]  the  solution  to  the  subproblem 
of  size  j .  Finally,  line  8  returns  r[/i],  which  equals  the  optimal  value  r„. 

The  bottom-up  and  top-down  versions  have  the  same  asymptotic  running  time. 
The  running  time  of  procedure  BOTTOM-UP-CUT-ROD  is  0(n2),  due  to  its 
doubly-nested  loop  structure.  The  number  of  iterations  of  its  inner  for  loop,  in 
lines  5-6,  forms  an  arithmetic  series.  The  running  time  of  its  top-down  counterpart, 
MEMOIZED-CUT-ROD,  is  also  ©(n2),  although  this  running  time  may  be  a  little 
harder  to  see.  Because  a  recursive  call  to  solve  a  previously  solved  subproblem 
returns  immediately,  MEMOIZED-CUT-ROD  solves  each  subproblem  just  once.  It 
solves  subproblems  for  sizes  0, 1,. . .  ,n.  To  solve  a  subproblem  of  size  n,  the  for 
loop  of  lines  6-7  iterates  n  times.  Thus,  the  total  number  of  iterations  of  this  for 
loop,  over  all  recursive  calls  of  Memoized-Cut-Rod,  forms  an  arithmetic  series, 
giving  a  total  of  0(n2)  iterations,  just  like  the  inner  for  loop  of  BOTTOM-Up- 
Cut-Rod.  (We  actually  are  using  a  form  of  aggregate  analysis  here.  We  shall  see 
aggregate  analysis  in  detail  in  Section  17.1.) 

Subproblem  graphs 

When  we  think  about  a  dynamic-programming  problem,  we  should  understand  the 
set  of  subproblems  involved  and  how  subproblems  depend  on  one  another. 

The  subproblem  graph  for  the  problem  embodies  exactly  this  information.  Fig¬ 
ure  15.4  shows  the  subproblem  graph  for  the  rod-cutting  problem  with  n  =  4.  It 
is  a  directed  graph,  containing  one  vertex  for  each  distinct  subproblem.  The  sub- 
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problem  graph  has  a  directed  edge  from  the  vertex  for  subproblem  x  to  the  vertex 
for  subproblem  y  if  determining  an  optimal  solution  for  subproblem  x  involves 
directly  considering  an  optimal  solution  for  subproblem  v.  For  example,  the  sub¬ 
problem  graph  contains  an  edge  from  x  to  y  if  a  top-down  recursive  procedure  for 
solving  x  directly  calls  itself  to  solve  y.  We  can  think  of  the  subproblem  graph 
as  a  “reduced”  or  “collapsed”  version  of  the  recursion  tree  for  the  top-down  recur¬ 
sive  method,  in  which  we  coalesce  all  nodes  for  the  same  subproblem  into  a  single 
vertex  and  direct  all  edges  from  parent  to  child. 

The  bottom-up  method  for  dynamic  programming  considers  the  vertices  of  the 
subproblem  graph  in  such  an  order  that  we  solve  the  subproblems  y  adjacent  to 
a  given  subproblem  x  before  we  solve  subproblem  x.  (Recall  from  Section  B.4 
that  the  adjacency  relation  is  not  necessarily  symmetric.)  Using  the  terminology 
from  Chapter  22,  in  a  bottom-up  dynamic -programming  algorithm,  we  consider  the 
vertices  of  the  subproblem  graph  in  an  order  that  is  a  “reverse  topological  sort,”  or 
a  “topological  sort  of  the  transpose”  (see  Section  22.4)  of  the  subproblem  graph.  In 
other  words,  no  subproblem  is  considered  until  all  of  the  subproblems  it  depends 
upon  have  been  solved.  Similarly,  using  notions  from  the  same  chapter,  we  can 
view  the  top-down  method  (with  memoization)  for  dynamic  programming  as  a 
“depth-first  search”  of  the  subproblem  graph  (see  Section  22.3). 

The  size  of  the  subproblem  graph  G  =  (V,  E)  can  help  us  determine  the  running 
time  of  the  dynamic  programming  algorithm.  Since  we  solve  each  subproblem  just 
once,  the  running  time  is  the  sum  of  the  times  needed  to  solve  each  subproblem. 
Typically,  the  time  to  compute  the  solution  to  a  subproblem  is  proportional  to  the 
degree  (number  of  outgoing  edges)  of  the  corresponding  vertex  in  the  subproblem 
graph,  and  the  number  of  subproblems  is  equal  to  the  number  of  vertices  in  the  sub¬ 
problem  graph.  In  this  common  case,  the  running  time  of  dynamic  programming 
is  linear  in  the  number  of  vertices  and  edges. 

Reconstructing  a  solution 

Our  dynamic-programming  solutions  to  the  rod-cutting  problem  return  the  value  of 
an  optimal  solution,  but  they  do  not  return  an  actual  solution:  a  list  of  piece  sizes. 
We  can  extend  the  dynamic-programming  approach  to  record  not  only  the  optimal 
value  computed  for  each  subproblem,  but  also  a  choice  that  led  to  the  optimal 
value.  With  this  information,  we  can  readily  print  an  optimal  solution. 

Here  is  an  extended  version  of  Bottom-Up-Cut-Rod  that  computes,  for  each 
rod  size  j ,  not  only  the  maximum  revenue  rj ,  but  also  Sj ,  the  optimal  size  of  the 
first  piece  to  cut  off: 
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Extended-Bottom-Up-Cut- Rod  (/>,«) 

1  let  r  [0 . .  n]  and  ,v[0  . .  n\  be  new  arrays 

2  r[  0]  =  0 

3  for  j  =  1  to  n 

4  q  —  — oo 

5  for  i  =  1  to  j 

6  if  q  <  P[i]  +  r[j  -  i] 

7  q  =  p[i]  +  r[j  -  i] 

8  s[j]  =  i 

9  r[j]  =  q 

10  return  r  and  s 

This  procedure  is  similar  to  Bottom-Up-Cut-Rod,  except  that  it  creates  the  ar¬ 
ray  s  in  line  1,  and  it  updates  5  [j  ]  in  line  8  to  hold  the  optimal  size  i  of  the  first 
piece  to  cut  off  when  solving  a  subproblem  of  size  j . 

The  following  procedure  takes  a  price  table  p  and  a  rod  size  n,  and  it  calls 
Extended-Bottom-Up-Cut-Rod  to  compute  the  array  s[l..n]  of  optimal 
first-piece  sizes  and  then  prints  out  the  complete  list  of  piece  sizes  in  an  optimal 
decomposition  of  a  rod  of  length  n : 

Print-Cut-Rod-Solution  (p.  n) 

1  (r,  s)  =  Extended-Bottom-Up-Cut-Rod  (p,n) 

2  while  n  >  0 

3  print .?[/?] 

4  n  —  n  —  j[n] 

In  our  rod-cutting  example,  the  call  Extended-Bottom-Up-Cut- Rod (/>,  10) 
would  return  the  following  arrays: 


i 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

r[i] 

0 

1 

5 

8 

10 

13 

17 

18 

22 

25 

30 

s[z] 

0 

1 

2 

3 

2 

2 

6 

1 

2 

3 

10 

A  call  to  Print-Cut-Rod-Solution  (p,  10)  would  print  just  10,  but  a  call  with 
n  =  7  would  print  the  cuts  1  and  6,  corresponding  to  the  first  optimal  decomposi¬ 
tion  for  r7  given  earlier. 

Exercises 


15.1-1 

Show  that  equation  (15.4)  follows  from  equation  (15.3)  and  the  initial  condition 
T(  0)  =  1. 
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15.1-2 

Show,  by  means  of  a  counterexample,  that  the  following  “greedy”  strategy  does 
not  always  determine  an  optimal  way  to  cut  rods.  Define  the  density  of  a  rod  of 
length  i  to  be  p,  /  i,  that  is,  its  value  per  inch.  The  greedy  strategy  for  a  rod  of 
length  n  cuts  off  a  first  piece  of  length  i,  where  1  <  i  <  n,  having  maximum 
density.  It  then  continues  by  applying  the  greedy  strategy  to  the  remaining  piece  of 
length  n  —  i. 


15.1- 3 

Consider  a  modification  of  the  rod-cutting  problem  in  which,  in  addition  to  a 
price  pi  for  each  rod,  each  cut  incurs  a  fixed  cost  of  c.  The  revenue  associated  with 
a  solution  is  now  the  sum  of  the  prices  of  the  pieces  minus  the  costs  of  making  the 
cuts.  Give  a  dynamic -programming  algorithm  to  solve  this  modified  problem. 

15.1- 4 

Modify  Memoized-Cut-Rod  to  return  not  only  the  value  but  the  actual  solution, 
too. 


15.1-5 

The  Fibonacci  numbers  are  defined  by  recurrence  (3.22).  Give  an  0(«)-time 
dynamic -programming  algorithm  to  compute  the  /3  th  Fibonacci  number.  Draw  the 
subproblem  graph.  How  many  vertices  and  edges  are  in  the  graph? 


15.2  Matrix-chain  multiplication 

Our  next  example  of  dynamic  programming  is  an  algorithm  that  solves  the  problem 
of  matrix-chain  multiplication.  We  are  given  a  sequence  (chain)  (A\,  A2, . . . ,  An) 
of  n  matrices  to  be  multiplied,  and  we  wish  to  compute  the  product 

ArAi  —  An.  (15.5) 

We  can  evaluate  the  expression  (15.5)  using  the  standard  algorithm  for  multiply¬ 
ing  pairs  of  matrices  as  a  subroutine  once  we  have  parenthesized  it  to  resolve  all 
ambiguities  in  how  the  matrices  are  multiplied  together.  Matrix  multiplication  is 
associative,  and  so  all  parenthesizations  yield  the  same  product.  A  product  of  ma¬ 
trices  is  fully  parenthesized  if  it  is  either  a  single  matrix  or  the  product  of  two  fully 
parenthesized  matrix  products,  surrounded  by  parentheses.  For  example,  if  the 
chain  of  matrices  is  (Ai,  A2,  A3,  A4),  then  we  can  fully  parenthesize  the  product 
A1A2A3A4  in  five  distinct  ways: 
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(A\(A2(A3A<x)))  , 

(^4 1  ((^2^43)^44))  . 

((^4 1^2)  (^4 3  ^4))  > 

((^4 1  (^2 2! 3))  ^4)  , 

(((^1^42)^3)214)  • 

How  we  parenthesize  a  chain  of  matrices  can  have  a  dramatic  impact  on  the  cost 
of  evaluating  the  product.  Consider  first  the  cost  of  multiplying  two  matrices.  The 
standard  algorithm  is  given  by  the  following  pseudocode,  which  generalizes  the 
Square-Matrix-Multiply  procedure  from  Section  4.2.  The  attributes  rows 
and  columns  are  the  numbers  of  rows  and  columns  in  a  matrix. 

Matrix-Multiply(H,  B) 

1  if  A. columns  ^  B.rows 

2  error  “incompatible  dimensions” 

3  else  let  C  be  a  new  A. rows  x  B. columns  matrix 

4  for  i  =  1  to  A  .  rows 

5  for  j  =  1  to  B.  columns 

6  dj  =  0 

7  for  k  =  1  to  A. columns 

8  Cjj  —  Cjj  "T  tl  j  k  '  llkj 

9  return  C 

We  can  multiply  two  matrices  A  and  B  only  if  they  are  compatible :  the  number  of 
columns  of  A  must  equal  the  number  of  rows  of  B.  If  A  is  a  p  x  q  matrix  and  B  is 
a  q  x  r  matrix,  the  resulting  matrix  C  is  a  p  x  r  matrix.  The  time  to  compute  C  is 
dominated  by  the  number  of  scalar  multiplications  in  line  8,  which  is  pqr.  In  what 
follows,  we  shall  express  costs  in  terms  of  the  number  of  scalar  multiplications. 

To  illustrate  the  different  costs  incurred  by  different  parenthesizations  of  a  matrix 
product,  consider  the  problem  of  a  chain  (Hi,  A2,  H3)  of  three  matrices.  Suppose 
that  the  dimensions  of  the  matrices  are  10  x  100,  100  x  5,  and  5  x  50,  respec¬ 
tively.  If  we  multiply  according  to  the  parenthesization  ((A  t  H2)H3),  we  perform 

10  ■  100  •  5  =  5000  scalar  multiplications  to  compute  the  10  x  5  matrix  prod¬ 
uct  A1A2,  plus  another  10-5-50  =  2500  scalar  multiplications  to  multiply  this 
matrix  by  H3,  for  a  total  of  7500  scalar  multiplications.  If  instead  we  multiply 
according  to  the  parenthesization  (A ,  (H2H3)),  we  perform  100-5-50  =  25,000 
scalar  multiplications  to  compute  the  100  x  50  matrix  product  A2A3,  plus  another 
10  ■  100  ■  50  =  50,000  scalar  multiplications  to  multiply  A ,  by  this  matrix,  for  a 
total  of  75,000  scalar  multiplications.  Thus,  computing  the  product  according  to 
the  first  parenthesization  is  10  times  faster. 

We  state  the  matrix-chain  multiplication  problem  as  follows:  given  a  chain 
(Hi,  A2, . . . ,  An)  of  n  matrices,  where  for  i  =  1,2 matrix  A,  has  dimension 
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Pi-x  x  pj,  fully  parenthesize  the  product  A  t  A2  ■■■  An  in  a  way  that  minimizes  the 
number  of  scalar  multiplications. 

Note  that  in  the  matrix-chain  multiplication  problem,  we  are  not  actually  multi¬ 
plying  matrices.  Our  goal  is  only  to  determine  an  order  for  multiplying  matrices 
that  has  the  lowest  cost.  Typically,  the  time  invested  in  determining  this  optimal 
order  is  more  than  paid  for  by  the  time  saved  later  on  when  actually  performing  the 
matrix  multiplications  (such  as  performing  only  7500  scalar  multiplications  instead 
of  75,000). 


Counting  the  number  of  parenthesizations 


Before  solving  the  matrix-chain  multiplication  problem  by  dynamic  programming, 
let  us  convince  ourselves  that  exhaustively  checking  all  possible  parenthesizations 
does  not  yield  an  efficient  algorithm.  Denote  the  number  of  alternative  parenthe¬ 
sizations  of  a  sequence  of  n  matrices  by  P(n).  When  n  =  1,  we  have  just  one 
matrix  and  therefore  only  one  way  to  fully  parenthesize  the  matrix  product.  When 
n  >  2,  a  fully  parenthesized  matrix  product  is  the  product  of  two  fully  parenthe¬ 
sized  matrix  subproducts,  and  the  split  between  the  two  subproducts  may  occur 
between  the  kth  and  (k  +  l)st  matrices  for  any  k  =  1,2, —  1.  Thus,  we 
obtain  the  recurrence 


1 


if  77  =  1  , 


P(n)  = 


n  —  1 

P  ( k )  P  (77  —  k)  if  77  >  2  . 

k=  1 


(15.6) 


Problem  12-4  asked  you  to  show  that  the  solution  to  a  similar  recurrence  is  the 
sequence  of  Catalan  numbers,  which  grows  as  f2(4"/7?3/2).  A  simpler  exercise 
(see  Exercise  15.2-3)  is  to  show  that  the  solution  to  the  recurrence  (15.6)  is  Cl(2n). 
The  number  of  solutions  is  thus  exponential  in  n,  and  the  brute-force  method  of 
exhaustive  search  makes  for  a  poor  strategy  when  determining  how  to  optimally 
parenthesize  a  matrix  chain. 


Applying  dynamic  programming 

We  shall  use  the  dynamic -programming  method  to  determine  how  to  optimally 
parenthesize  a  matrix  chain.  In  so  doing,  we  shall  follow  the  four-step  sequence 
that  we  stated  at  the  beginning  of  this  chapter: 

1 .  Characterize  the  structure  of  an  optimal  solution. 

2.  Recursively  define  the  value  of  an  optimal  solution. 

3.  Compute  the  value  of  an  optimal  solution. 
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4.  Construct  an  optimal  solution  from  computed  information. 

We  shall  go  through  these  steps  in  order,  demonstrating  clearly  how  we  apply  each 
step  to  the  problem. 

Step  1:  The  structure  of  an  optimal  parenthesization 

For  our  first  step  in  the  dynamic-programming  paradigm,  we  find  the  optimal  sub¬ 
structure  and  then  use  it  to  construct  an  optimal  solution  to  the  problem  from  opti¬ 
mal  solutions  to  subproblems.  In  the  matrix-chain  multiplication  problem,  we  can 
perform  this  step  as  follows.  For  convenience,  let  us  adopt  the  notation  Aj.j,  where 
i  <  j>  for  the  matrix  that  results  from  evaluating  the  product  AtAi+1  ■■■  Aj.  Ob¬ 
serve  that  if  the  problem  is  nontrivial,  i.e.,  i  <  j ,  then  to  parenthesize  the  product 
AiAi+i  ■  ■  ■  Aj,  we  must  split  the  product  between  Ak  and  Ak+]  for  some  integer  k 
in  the  range  i  <  k  <  j .  That  is,  for  some  value  of  k,  we  first  compute  the  matrices 
Ai.'k  and  Ak+i..j  and  then  multiply  them  together  to  produce  the  final  product  Aj.j. 
The  cost  of  parenthesizing  this  way  is  the  cost  of  computing  the  matrix  At  ^,  plus 
the  cost  of  computing  Ak+i..j,  plus  the  cost  of  multiplying  them  together. 

The  optimal  substructure  of  this  problem  is  as  follows.  Suppose  that  to  op¬ 
timally  parenthesize  AjAi+i  •  ■  ■  Aj,  we  split  the  product  between  Ak  and  Ak+]. 
Then  the  way  we  parenthesize  the  “prefix”  subchain  A,-Ai+i  •  ■  ■  Ak  within  this 
optimal  parenthesization  of  T,T,+1  ■  ■  ■  Aj  must  be  an  optimal  parenthesization  of 
AiAi+i  ■  ■  ■  Ak.  Why?  If  there  were  a  less  costly  way  to  parenthesize  ,4 ,  d ,  + 1  •  •  •  Ak, 
then  we  could  substitute  that  parenthesization  in  the  optimal  parenthesization 
of  AjAi+i  ■■■  Aj  to  produce  another  way  to  parenthesize  A,  AI  +  \  ■■■  Aj  whose  cost 
was  lower  than  the  optimum:  a  contradiction.  A  similar  observation  holds  for  how 
we  parenthesize  the  subchain  Ak+\ Ak+2  ■  ■  ■  Aj  in  the  optimal  parenthesization  of 
Aj  Ai+\  ■  ■  ■  A  j :  it  must  be  an  optimal  parenthesization  of  Ak+\ Ak+2  ■  ■  ■  Aj. 

Now  we  use  our  optimal  substructure  to  show  that  we  can  construct  an  optimal 
solution  to  the  problem  from  optimal  solutions  to  subproblems.  We  have  seen  that 
any  solution  to  a  nontrivial  instance  of  the  matrix-chain  multiplication  problem 
requires  us  to  split  the  product,  and  that  any  optimal  solution  contains  within  it  op¬ 
timal  solutions  to  subproblem  instances.  Thus,  we  can  build  an  optimal  solution  to 
an  instance  of  the  matrix-chain  multiplication  problem  by  splitting  the  problem  into 
two  subproblems  (optimally  parenthesizing  Aj Ai+\  ■■■  Ak  and  Ak+\Ak+2  ■  ■  ■  Aj), 
finding  optimal  solutions  to  subproblem  instances,  and  then  combining  these  op¬ 
timal  subproblem  solutions.  We  must  ensure  that  when  we  search  for  the  correct 
place  to  split  the  product,  we  have  considered  all  possible  places,  so  that  we  are 
sure  of  having  examined  the  optimal  one. 
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Step  2:  A  recursive  solution 


Next,  we  define  the  cost  of  an  optimal  solution  recursively  in  terms  of  the  optimal 
solutions  to  subproblems.  For  the  matrix-chain  multiplication  problem,  we  pick  as 
our  subproblems  the  problems  of  determining  the  minimum  cost  of  parenthesizing 
Aj  Ai+ 1  ■■■  Aj  for  1  <  i  <  j  <  n.  Let  m[i,  j]  be  the  minimum  number  of  scalar 
multiplications  needed  to  compute  the  matrix  A../;  for  the  full  problem,  the  lowest- 
cost  way  to  compute  Ai would  thus  be  m[l,n\. 

We  can  define  m[i,  j ]  recursively  as  follows.  If  i  =  j ,  the  problem  is  trivial; 
the  chain  consists  of  just  one  matrix  ALa  =  At,  so  that  no  scalar  multiplications 
are  necessary  to  compute  the  product.  Thus,  m[i,i]  =  0  for  i  =  1,2, ...  ,n.  To 
compute  m  [i ,  j  ]  when  i  <  j ,  we  take  advantage  of  the  structure  of  an  optimal 
solution  from  step  1.  Let  us  assume  that  to  optimally  parenthesize,  we  split  the 
product  AjAi+i  ■■■  Aj  between  Ak  and  Ak+ x,  where  i  <  k  <  j.  Then,  m[i,j] 
equals  the  minimum  cost  for  computing  the  subproducts  A..<t  and  Ak+\..j,  plus  the 
cost  of  multiplying  these  two  matrices  together.  Recalling  that  each  matrix  A,  is 
Pi— i  x  pi,  we  see  that  computing  the  matrix  product  A.^/L+l,7  takes  PkPj 
scalar  multiplications.  Thus,  we  obtain 

m[i,j]  =  m[i,k]  +  m[k  +  1,;']  +  Pi-xPkPj  ■ 

This  recursive  equation  assumes  that  we  know  the  value  of  k,  which  we  do  not. 
There  are  only  j  —i  possible  values  for  k,  however,  namely  k  =  i,  i  + 1, . . . ,  j  —  1. 
Since  the  optimal  parenthesization  must  use  one  of  these  values  for  k,  we  need  only 
check  them  all  to  find  the  best.  Thus,  our  recursive  definition  for  the  minimum  cost 
of  parenthesizing  the  product  At  A+i  ■■■  Aj  becomes 


0  if  i  =  j  , 

min  {m[i,k]  +  m[k  +  1 ,  j]  +  pi-Xpkpj }  if  i  <  j  . 

i<k<  j 


(15.7) 


The  m[i,  j]  values  give  the  costs  of  optimal  solutions  to  subproblems,  but  they 
do  not  provide  all  the  information  we  need  to  construct  an  optimal  solution.  To 
help  us  do  so,  we  define  x[z',  j  ]  to  be  a  value  of  k  at  which  we  split  the  product 
AjAi+i  ■■■  Aj  in  an  optimal  parenthesization.  That  is,  s[i,  j]  equals  a  value  k  such 
that  m[i,  j]  =  m[i,k]  +  m[k  +  1 ,  j]  +  Pi-iPkPj- 


Step  3:  Computing  the  optimal  costs 

At  this  point,  we  could  easily  write  a  recursive  algorithm  based  on  recurrence  (15.7) 
to  compute  the  minimum  cost  m[\,n]  for  multiplying  A  t  A2  ■  ■  ■  A„.  As  we  saw  for 
the  rod-cutting  problem,  and  as  we  shall  see  in  Section  15.3,  this  recursive  algo¬ 
rithm  takes  exponential  time,  which  is  no  better  than  the  brute-force  method  of 
checking  each  way  of  parenthesizing  the  product. 
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Observe  that  we  have  relatively  few  distinct  subproblems:  one  subproblem  for 

each  choice  of  i  and  j  satisfying  1  <  i  <  j  <  n,  or  (”)  +  n  =  0(«2)  in  all. 

A  recursive  algorithm  may  encounter  each  subproblem  many  times  in  different 
branches  of  its  recursion  tree.  This  property  of  overlapping  subproblems  is  the 
second  hallmark  of  when  dynamic  programming  applies  (the  first  hallmark  being 
optimal  substructure). 

Instead  of  computing  the  solution  to  recurrence  (15.7)  recursively,  we  compute 
the  optimal  cost  by  using  a  tabular,  bottom-up  approach.  (We  present  the  corre¬ 
sponding  top-down  approach  using  memoization  in  Section  15.3.) 

We  shall  implement  the  tabular,  bottom-up  method  in  the  procedure  Matrix- 
Chain-Order,  which  appears  below.  This  procedure  assumes  that  matrix  A, 
has  dimensions  x  p,  for  i  =  1,2 Its  input  is  a  sequence  p  — 

(p0,Pi,---,Pn),  where  p. length  =  n  +  1.  The  procedure  uses  an  auxiliary 

table  m[l..n,l..n\  for  storing  the  m[i,j]  costs  and  another  auxiliary  table 
s[\  .  .n  —  1,2. .«]  that  records  which  index  of  k  achieved  the  optimal  cost  in  com¬ 
puting  m[i,  j ].  We  shall  use  the  table  s  to  construct  an  optimal  solution. 

In  order  to  implement  the  bottom-up  approach,  we  must  determine  which  entries 
of  the  table  we  refer  to  when  computing  m[i,  j].  Equation  (15.7)  shows  that  the 
cost  m[i,  j]  of  computing  a  matrix-chain  product  of  j—i  +  l  matrices  depends  only 
on  the  costs  of  computing  matrix-chain  products  of  fewer  than  j—i  +  l  matrices. 
That  is,  for  k  =  i,  i  +  1, . . . ,  j  —  1,  the  matrix  A,  ..*  is  a  product  of  k  —  i  +  1  < 
j  —  i  +  l  matrices  and  the  matrix  Ak+i..j  is  a  product  of  j  —  k  <  j—i  +  l 
matrices.  Thus,  the  algorithm  should  fill  in  the  table  m  in  a  manner  that  corresponds 
to  solving  the  parenthesization  problem  on  matrix  chains  of  increasing  length.  For 
the  subproblem  of  optimally  parenthesizing  the  chain  A,  A,+1  •  •  ■  Aj,  we  consider 
the  subproblem  size  to  be  the  length  j  —  i  +  1  of  the  chain. 

Matrix-Chain-Order  (/>) 

1  n  =  p.  length  —  1 

2  let  m[l . .  n ,  1 . .  n\  and  .s[l  . .  n  —  1 , 2  . .  n]  be  new  tables 

3  for  i  =  1  to  n 

4  m[i,i ]  =  0 

5  for  /=  2  to  n  //  l  is  the  chain  length 

6  for  i  =  1  to  n  —  l  +  1 

7  j  —  i  +  l  —  1 

8  m[i,j ]  =  oo 

9  for  k  =  i  to  j  —  1 

10  q  =  m[i,k\  +  m[k  +  1 ,  j]  +  pi-Xpkpj 

11  if  q  <  m  [i ,  j  ] 

12  m  [f ,  y  ]  =  q 

13  s[i,  j]  =  k 

14  return  m  and  s 
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m 


s 


Figure  15.5  The  m  and  5  tables  computed  by  Matrix  Chain  Order  for  n  =  6  and  the  follow 
ing  matrix  dimensions: 


matrix 

Ax 

a2 

As 

A  4 

As 

As 

dimension 

30x35 

35  x  15 

15x5 

5x  10 

10x20 

20  x  25 

The  tables  are  rotated  so  that  the  main  diagonal  runs  horizontally.  The  m  table  uses  only  the  main 
diagonal  and  upper  triangle,  and  the  .v  table  uses  only  the  upper  triangle.  The  minimum  number  of 
scalar  multiplications  to  multiply  the  6  matrices  is  m  [1 . 6]  =  15, 1 25.  Of  the  darker  entries,  the  pairs 
that  have  the  same  shading  are  taken  together  in  line  10  when  computing 


m[ 2,  5] 


min 


m[2,2]  +  m[3.5]  +  />i/>2/>5  =  0  +  2500  +  35-15-20  =  13,000, 

m[2,3]  +  m[4,5] +pip3ps  =  2625+  1000  +  35-5-20  =  7125, 
m[2,4] +  m[5,5] +  pip4p5  =  4375  +  0  +  35-10-20  =  11,375 


7125  . 


The  algorithm  first  computes  m[i,  j]  =  0  for  /  =  1,2, ...  ,n  (the  minimum 
costs  for  chains  of  length  1)  in  lines  3^1.  It  then  uses  recurrence  (15.7)  to  compute 
m[i,  i  +  1]  for  /  =  1, 2, 1  (the  minimum  costs  for  chains  of  length  1  =  2) 
during  the  first  execution  of  the  for  loop  in  lines  5-13.  The  second  time  through  the 

loop,  it  computes  m  [/,  i  +2]  for  /  =  1,2 . n—2  (the  minimum  costs  for  chains  of 

length  l  =  3),  and  so  forth.  At  each  step,  the  m  [i,  j]  cost  computed  in  lines  10-13 
depends  only  on  table  entries  m[i,  k]  and  m[k  +  1,  j]  already  computed. 

Figure  15.5  illustrates  this  procedure  on  a  chain  of  n  =  6  matrices.  Since 
we  have  defined  m[i,j]  only  for  i  <  j,  only  the  portion  of  the  table  m  strictly 
above  the  main  diagonal  is  used.  The  figure  shows  the  table  rotated  to  make  the 
main  diagonal  run  horizontally.  The  matrix  chain  is  listed  along  the  bottom.  Us¬ 
ing  this  layout,  we  can  find  the  minimum  cost  for  multiplying  a  subchain 

Aj  Ai+x  Aj  of  matrices  at  the  intersection  of  lines  running  northeast  from  A,  and 
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northwest  from  Aj .  Each  horizontal  row  in  the  table  contains  the  entries  for  matrix 
chains  of  the  same  length.  Matrix-Chain-Order  computes  the  rows  from  bot¬ 
tom  to  top  and  from  left  to  right  within  each  row.  It  computes  each  entry  m  [i ,  j  ] 
using  the  products  /?;_]  pk  pj  for  k  =  i,  i  +  1, . . . ,  j  —  1  and  all  entries  southwest 
and  southeast  from  m [i,  j]. 

A  simple  inspection  of  the  nested  loop  structure  of  Matrix-Chain-Order 
yields  a  running  time  of  0(n3)  for  the  algorithm.  The  loops  are  nested  three  deep, 
and  each  loop  index  (/ ,  i ,  and  k)  takes  on  at  most  77  —  1  values.  Exercise  15.2-5  asks 
you  to  show  that  the  running  time  of  this  algorithm  is  in  fact  also  Q(n3).  The  al¬ 
gorithm  requires  0(/;2)  space  to  store  the  m  and  s  tables.  Thus,  Matrix-Chain- 
Order  is  much  more  efficient  than  the  exponential-time  method  of  enumerating 
all  possible  parenthesizations  and  checking  each  one. 

Step  4:  Constructing  an  optimal  solution 

Although  Matrix-Chain-Order  determines  the  optimal  number  of  scalar  mul¬ 
tiplications  needed  to  compute  a  matrix-chain  product,  it  does  not  directly  show 
how  to  multiply  the  matrices.  The  table  s[l  . .  n  —  1,2..  /?]  gives  us  the  informa¬ 
tion  we  need  to  do  so.  Each  entry  s[i,  j]  records  a  value  of  k  such  that  an  op¬ 
timal  parenthesization  of  A{ Ai+i  ■■■  Aj  splits  the  product  between  Ak  and  Ak+X. 
Thus,  we  know  that  the  final  matrix  multiplication  in  computing  dL.„  optimally 
is  We  can  determine  the  earlier  matrix  multiplications  recur¬ 

sively,  since  s[l,  s[l,  n]\  determines  the  last  matrix  multiplication  when  computing 
"4i..j[i,h]  and  s[s[l,n\  +  1, 77]  determines  the  last  matrix  multiplication  when  com¬ 
puting  4j[1>b]+i..b.  The  following  recursive  procedure  prints  an  optimal  parenthe¬ 
sization  of  (Aj,  Ai+i,  . . . ,  Aj),  given  the  s  table  computed  by  Matrix-Chain- 
Order  and  the  indices  i  and  j .  The  initial  call  Print-Optimal- Parens  (s,  1, 77) 
prints  an  optimal  parenthesization  of  (A\,  A2, . . . ,  A„). 

Print-Optimal-Parens  (s,  i,  j) 

1  if  /  ==  j 

2  print  “A” 

3  else  print  “(” 

4  Print-Optimal-Parens  (s,  i,s[i,  j]) 

5  Print-Optimal-Parens  (5,  s[7 ,  j  ]  +  1 ,  j) 

6  print  “)” 

In  the  example  of  Figure  15.5,  the  call  Print-Optimal-Parens  (s,  1,6)  prints 
the  parenthesization  ((A1(A2A3))((A4A5)A6)). 
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Exercises 


15.2-1 


Find  an  optimal  parenthesization  of  a  matrix-chain  product  whose  sequence  of 
dimensions  is  (5, 10,  3,  12,  5,  50,  6). 

15.2- 2 

Give  a  recursive  algorithm  Matrix-Chain-Multiply(A,  s,  i.j)  that  actually 
performs  the  optimal  matrix-chain  multiplication,  given  the  sequence  of  matrices 
{A\,  A 2, . . . ,  An),  the  s  table  computed  by  Matrix-Chain-Order,  and  the  in¬ 
dices  i  and  j .  (The  initial  call  would  be  Matrix-Chain-Multiply(A,  s,  1 ,«).) 

15.2- 3 

Use  the  substitution  method  to  show  that  the  solution  to  the  recurrence  (15.6) 
is  Q(2n). 

15.2- 4 

Describe  the  subproblem  graph  for  matrix-chain  multiplication  with  an  input  chain 
of  length  n.  How  many  vertices  does  it  have?  How  many  edges  does  it  have,  and 
which  edges  are  they? 


15.2-5 


Let  R(i,j)  be  the  number  of  times  that  table  entry  m[i,j]  is  referenced  while 
computing  other  table  entries  in  a  call  of  Matrix-Chain-Order.  Show  that  the 
total  number  of  references  for  the  entire  table  is 


i= 1  j=i 


{Hint:  You  may  find  equation  (A.3)  useful.) 


15.2-6 


Show  that  a  full  parenthesization  of  an  n -element  expression  has  exactly  n  —  1  pairs 
of  parentheses. 


15.3  Elements  of  dynamic  programming 


Although  we  have  just  worked  through  two  examples  of  the  dynamic -programming 
method,  you  might  still  be  wondering  just  when  the  method  applies.  From  an  en¬ 
gineering  perspective,  when  should  we  look  for  a  dynamic -programming  solution 
to  a  problem?  In  this  section,  we  examine  the  two  key  ingredients  that  an  opti- 
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mization  problem  must  have  in  order  for  dynamic  programming  to  apply:  optimal 
substructure  and  overlapping  subproblems.  We  also  revisit  and  discuss  more  fully 
how  memoization  might  help  us  take  advantage  of  the  overlapping-subproblems 
property  in  a  top-down  recursive  approach. 

Optimal  substructure 

The  first  step  in  solving  an  optimization  problem  by  dynamic  programming  is  to 
characterize  the  structure  of  an  optimal  solution.  Recall  that  a  problem  exhibits 
optimal  substructure  if  an  optimal  solution  to  the  problem  contains  within  it  opti¬ 
mal  solutions  to  subproblems.  Whenever  a  problem  exhibits  optimal  substructure, 
we  have  a  good  clue  that  dynamic  programming  might  apply.  (As  Chapter  16  dis¬ 
cusses,  it  also  might  mean  that  a  greedy  strategy  applies,  however.)  In  dynamic 
programming,  we  build  an  optimal  solution  to  the  problem  from  optimal  solutions 
to  subproblems.  Consequently,  we  must  take  care  to  ensure  that  the  range  of  sub¬ 
problems  we  consider  includes  those  used  in  an  optimal  solution. 

We  discovered  optimal  substructure  in  both  of  the  problems  we  have  examined 
in  this  chapter  so  far.  In  Section  15.1,  we  observed  that  the  optimal  way  of  cut¬ 
ting  up  a  rod  of  length  n  (if  we  make  any  cuts  at  all)  involves  optimally  cutting 
up  the  two  pieces  resulting  from  the  first  cut.  In  Section  15.2,  we  observed  that 
an  optimal  parenthesization  of  A,Ai+ 1  ■■■  A  j  that  splits  the  product  between  Ag 
and  Ag+i  contains  within  it  optimal  solutions  to  the  problems  of  parenthesizing 
AiAj+i  ■  ■  ■  Ag  and  Ag+\ Ag+2  ■  ■  ■  Aj. 

You  will  find  yourself  following  a  common  pattern  in  discovering  optimal  sub¬ 
structure: 

1.  You  show  that  a  solution  to  the  problem  consists  of  making  a  choice,  such  as 
choosing  an  initial  cut  in  a  rod  or  choosing  an  index  at  which  to  split  the  matrix 
chain.  Making  this  choice  leaves  one  or  more  subproblems  to  be  solved. 

2.  You  suppose  that  for  a  given  problem,  you  are  given  the  choice  that  leads  to  an 
optimal  solution.  You  do  not  concern  yourself  yet  with  how  to  determine  this 
choice.  You  just  assume  that  it  has  been  given  to  you. 

3.  Given  this  choice,  you  determine  which  subproblems  ensue  and  how  to  best 
characterize  the  resulting  space  of  subproblems. 

4.  You  show  that  the  solutions  to  the  subproblems  used  within  an  optimal  solution 
to  the  problem  must  themselves  be  optimal  by  using  a  “cut-and-paste”  tech¬ 
nique.  You  do  so  by  supposing  that  each  of  the  subproblem  solutions  is  not 
optimal  and  then  deriving  a  contradiction.  In  particular",  by  “cutting  out”  the 
nonoptimal  solution  to  each  subproblem  and  “pasting  in”  the  optimal  one,  you 
show  that  you  can  get  a  better  solution  to  the  original  problem,  thus  contradict¬ 
ing  your  supposition  that  you  already  had  an  optimal  solution.  If  an  optimal 
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solution  gives  rise  to  more  than  one  subproblem,  they  are  typically  so  similar 
that  you  can  modify  the  cut-and-paste  argument  for  one  to  apply  to  the  others 
with  little  effort. 

To  characterize  the  space  of  subproblems,  a  good  rule  of  thumb  says  to  try  to 
keep  the  space  as  simple  as  possible  and  then  expand  it  as  necessary.  For  example, 
the  space  of  subproblems  that  we  considered  for  the  rod-cutting  problem  contained 
the  problems  of  optimally  cutting  up  a  rod  of  length  i  for  each  size  i .  This  sub¬ 
problem  space  worked  well,  and  we  had  no  need  to  try  a  more  general  space  of 
subproblems. 

Conversely,  suppose  that  we  had  tried  to  constrain  our  subproblem  space  for 
matrix-chain  multiplication  to  matrix  products  of  the  form  A1A2  ■  ■  ■  Aj.  As  before, 
an  optimal  parenthesization  must  split  this  product  between  Ag  and  Ag+]  for  some 
1  <  k  <  j .  Unless  we  could  guarantee  that  k  always  equals  j  —  1 ,  we  would  find 
that  we  had  subproblems  of  the  form  A1A2  ■  ■  ■  Ag  and  Ag+lAg+2  ■■■  Aj,  and  that 
the  latter  subproblem  is  not  of  the  form  AXA2  -  ■  ■  Aj.  For  this  problem,  we  needed 
to  allow  our  subproblems  to  vary  at  “both  ends,”  that  is,  to  allow  both  i  and  j  to 
vary  in  the  subproblem  AjAi+1  ■■■  Aj. 

Optimal  substructure  varies  across  problem  domains  in  two  ways: 

1 .  how  many  subproblems  an  optimal  solution  to  the  original  problem  uses,  and 

2.  how  many  choices  we  have  in  determining  which  subproblem(s)  to  use  in  an 
optimal  solution. 

In  the  rod-cutting  problem,  an  optimal  solution  for  cutting  up  a  rod  of  size  n 
uses  just  one  subproblem  (of  size  n  —  i ),  but  we  must  consider  n  choices  for  i 
in  order  to  determine  which  one  yields  an  optimal  solution.  Matrix-chain  mul¬ 
tiplication  for  the  subchain  A,A,+1  ■■■Aj  serves  as  an  example  with  two  sub¬ 
problems  and  j  —  i  choices.  For  a  given  matrix  Ag  at  which  we  split  the  prod¬ 
uct,  we  have  two  subproblems— parenthesizing  A,  Ai+X  ■■■  Ag  and  parenthesizing 
Ag+\Ag+ 2  ■  ■  ■  Aj  —  and  we  must  solve  both  of  them  optimally.  Once  we  determine 
the  optimal  solutions  to  subproblems,  we  choose  from  among  j  —  i  candidates  for 
the  index  k. 

Informally,  the  running  time  of  a  dynamic -programming  algorithm  depends  on 
the  product  of  two  factors:  the  number  of  subproblems  overall  and  how  many 
choices  we  look  at  for  each  subproblem.  In  rod  cutting,  we  had  k)(n  )  subproblems 
overall,  and  at  most  n  choices  to  examine  for  each,  yielding  an  0(n2)  running  time. 
Matrix-chain  multiplication  had  0(n2)  subproblems  overall,  and  in  each  we  had  at 
most  n  —  1  choices,  giving  an  0(n 3)  running  time  (actually,  a  @(n3)  running  time, 
by  Exercise  15.2-5). 

Usually,  the  subproblem  graph  gives  an  alternative  way  to  perform  the  same 
analysis.  Each  vertex  corresponds  to  a  subproblem,  and  the  choices  for  a  sub- 
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problem  are  the  edges  incident  to  that  subproblem.  Recall  that  in  rod  cutting, 
the  subproblem  graph  had  n  vertices  and  at  most  n  edges  per  vertex,  yielding  an 
0{n2)  running  time.  For  matrix-chain  multiplication,  if  we  were  to  draw  the  sub¬ 
problem  graph,  it  would  have  0(«2)  vertices  and  each  vertex  would  have  degree  at 
most  n  —  1,  giving  a  total  of  0(n3)  vertices  and  edges. 

Dynamic  programming  often  uses  optimal  substructure  in  a  bottom-up  fashion. 
That  is,  we  first  find  optimal  solutions  to  subproblems  and,  having  solved  the  sub¬ 
problems,  we  find  an  optimal  solution  to  the  problem.  Finding  an  optimal  solu¬ 
tion  to  the  problem  entails  making  a  choice  among  subproblems  as  to  which  we 
will  use  in  solving  the  problem.  The  cost  of  the  problem  solution  is  usually  the 
subproblem  costs  plus  a  cost  that  is  directly  attributable  to  the  choice  itself.  In 
rod  cutting,  for  example,  first  we  solved  the  subproblems  of  determining  optimal 
ways  to  cut  up  rods  of  length  /  for  i  =  0,  1, . . . ,  n  —  1,  and  then  we  determined 
which  such  subproblem  yielded  an  optimal  solution  for  a  rod  of  length  n ,  using 
equation  (15.2).  The  cost  attributable  to  the  choice  itself  is  the  term  in  equa¬ 
tion  (15.2).  In  matrix-chain  multiplication,  we  determined  optimal  parenthesiza- 
tions  of  subchains  of  A,  A,-+1  ■  ■  ■  Aj,  and  then  we  chose  the  matrix  Ag  at  which  to 
split  the  product.  The  cost  attributable  to  the  choice  itself  is  the  term  p,_i  pg  pj. 

In  Chapter  16,  we  shall  examine  “greedy  algorithms,”  which  have  many  similar¬ 
ities  to  dynamic  programming.  In  particular,  problems  to  which  greedy  algorithms 
apply  have  optimal  substructure.  One  major  difference  between  greedy  algorithms 
and  dynamic  programming  is  that  instead  of  first  finding  optimal  solutions  to  sub¬ 
problems  and  then  making  an  informed  choice,  greedy  algorithms  first  make  a 
“greedy”  choice— the  choice  that  looks  best  at  the  time— and  then  solve  a  resulting 
subproblem,  without  bothering  to  solve  all  possible  related  smaller  subproblems. 
Surprisingly,  in  some  cases  this  strategy  works! 

Subtleties 

You  should  be  careful  not  to  assume  that  optimal  substructure  applies  when  it  does 
not.  Consider  the  following  two  problems  in  which  we  are  given  a  directed  graph 
G  =  (F,  E)  and  vertices  u.v  e  V. 

Unweighted  shortest  path:3  Find  a  path  from  u  to  v  consisting  of  the  fewest 
edges.  Such  a  path  must  be  simple,  since  removing  a  cycle  from  a  path  pro¬ 
duces  a  path  with  fewer  edges. 


3We  use  the  term  “unweighted”  to  distinguish  this  problem  from  that  of  finding  shortest  paths  with 
weighted  edges,  which  we  shall  see  in  Chapters  24  and  25.  We  can  use  the  breadth  first  search 
technique  of  Chapter  22  to  solve  the  unweighted  problem. 


382 


Chapter  15  Dynamic  Programming 


Q 


Figure  15.6  A  directed  graph  showing  that  the  problem  of  finding  a  longest  simple  path  in  an 
unweighted  directed  graph  does  not  have  optimal  substructure.  The  path  q  — »•  r  — >  t  is  a  longest 
simple  path  from  q  to  t ,  but  the  subpath  q  -*■  r  is  not  a  longest  simple  path  from  q  to  r,  nor  is  the 
subpath  r  — >  t  a  longest  simple  path  from  r  to  t. 

Unweighted  longest  simple  path:  Find  a  simple  path  from  u  to  v  consisting  of 
the  most  edges.  We  need  to  include  the  requirement  of  simplicity  because  other¬ 
wise  we  can  traverse  a  cycle  as  many  times  as  we  like  to  create  paths  with  an 
arbitrarily  large  number  of  edges. 

The  unweighted  shortest-path  problem  exhibits  optimal  substructure,  as  follows. 
Suppose  that  u  ^  v,  so  that  the  problem  is  nontrivial.  Then,  any  path  p  from  u 
to  v  must  contain  an  intermediate  vertex,  say  in.  (Note  that  w  may  be  u  or  v.) 
Thus,  we  can  decompose  the  path  u  v  into  subpaths  u  in  v.  Clearly,  the 
number  of  edges  in  p  equals  the  number  of  edges  in  px  plus  the  number  of  edges 
in  p2.  We  claim  that  if  p  is  an  optimal  (i.e.,  shortest)  path  from  u  to  v,  then  px 
must  be  a  shortest  path  from  u  to  in.  Why?  We  use  a  “cut-and-paste”  argument: 
if  there  were  another  path,  say  p\ ,  from  u  to  in  with  fewer  edges  than  px ,  then  we 

could  cut  out  pi  and  paste  in  p\  to  produce  a  path  u  ^  in  v  with  fewer  edges 
than  p,  thus  contradicting  p’s  optimality.  Symmetrically,  p2  must  be  a  shortest 
path  from  in  to  v.  Thus,  we  can  find  a  shortest  path  from  u  to  v  by  considering 
all  intermediate  vertices  in,  finding  a  shortest  path  from  u  to  in  and  a  shortest  path 
from  u;  to  v,  and  choosing  an  intermediate  vertex  in  that  yields  the  overall  shortest 
path.  In  Section  25.2,  we  use  a  variant  of  this  observation  of  optimal  substructure 
to  find  a  shortest  path  between  every  pair  of  vertices  on  a  weighted,  directed  graph. 

You  might  be  tempted  to  assume  that  the  problem  of  finding  an  unweighted 
longest  simple  path  exhibits  optimal  substructure  as  well.  After  all,  if  we  decom¬ 
pose  a  longest  simple  path  u  ^  v  into  subpaths  u  Q  w  Q  v,  then  mustn’t  px 
be  a  longest  simple  path  from  m  to  w,  and  mustn’t  p2  be  a  longest  simple  path 
from  in  to  v?  The  answer  is  no!  Figure  15.6  supplies  an  example.  Consider  the 
path  q  — r  — >•  t,  which  is  a  longest  simple  path  from  q  to  t.  Is  q  — >  r  a  longest 
simple  path  from  q  to  r?  No,  for  the  path  q  —>■  s  ^  t  —*■  r  is  a  simple  path 
that  is  longer.  Is  r  — >  t  a  longest  simple  path  from  r  to  /?  No  again,  for  the  path 
r  q  s  t  is  a  simple  path  that  is  longer. 
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This  example  shows  that  for  longest  simple  paths,  not  only  does  the  problem 
lack  optimal  substructure,  but  we  cannot  necessarily  assemble  a  “legal”  solution 
to  the  problem  from  solutions  to  subproblems.  If  we  combine  the  longest  simple 
paths  q  — >•  s  — t  — >  r  and  r  —>■  q  ^  s  —*■  t,  we  get  the  path  q^-s^-t^-r^- 
q  — ^  s  — ^  t ,  which  is  not  simple.  Indeed,  the  problem  of  finding  an  unweighted 
longest  simple  path  does  not  appear  to  have  any  sort  of  optimal  substructure.  No 
efficient  dynamic -programming  algorithm  for  this  problem  has  ever  been  found.  In 
fact,  this  problem  is  NP-complete,  which— as  we  shall  see  in  Chapter  34— means 
that  we  are  unlikely  to  find  a  way  to  solve  it  in  polynomial  time. 

Why  is  the  substructure  of  a  longest  simple  path  so  different  from  that  of  a  short¬ 
est  path?  Although  a  solution  to  a  problem  for  both  longest  and  shortest  paths  uses 
two  subproblems,  the  subproblems  in  finding  the  longest  simple  path  are  not  inde¬ 
pendent,  whereas  for  shortest  paths  they  are.  What  do  we  mean  by  subproblems 
being  independent?  We  mean  that  the  solution  to  one  subproblem  does  not  affect 
the  solution  to  another  subproblem  of  the  same  problem.  For  the  example  of  Fig¬ 
ure  15.6,  we  have  the  problem  of  finding  a  longest  simple  path  from  q  to  t  with  two 
subproblems:  finding  longest  simple  paths  from  q  to  r  and  from  r  tot.  For  the  first 
of  these  subproblems,  we  choose  the  path  q  — >•  s  — t  — »•  r,  and  so  we  have  also 
used  the  vertices  s  and  t.  We  can  no  longer  use  these  vertices  in  the  second  sub¬ 
problem,  since  the  combination  of  the  two  solutions  to  subproblems  would  yield  a 
path  that  is  not  simple.  If  we  cannot  use  vertex  t  in  the  second  problem,  then  we 
cannot  solve  it  at  all,  since  t  is  required  to  be  on  the  path  that  we  find,  and  it  is 
not  the  vertex  at  which  we  are  “splicing”  together  the  subproblem  solutions  (that 
vertex  being  r).  Because  we  use  vertices  5  and  t  in  one  subproblem  solution,  we 
cannot  use  them  in  the  other  subproblem  solution.  We  must  use  at  least  one  of  them 
to  solve  the  other  subproblem,  however,  and  we  must  use  both  of  them  to  solve  it 
optimally.  Thus,  we  say  that  these  subproblems  are  not  independent.  Looked  at 
another  way,  using  resources  in  solving  one  subproblem  (those  resources  being 
vertices)  renders  them  unavailable  for  the  other  subproblem. 

Why,  then,  are  the  subproblems  independent  for  finding  a  shortest  path?  The 
answer  is  that  by  nature,  the  subproblems  do  not  share  resources.  We  claim  that 
if  a  vertex  w  is  on  a  shortest  path  p  from  u  to  v,  then  we  can  splice  together  any 
shortest  path  u  w  and  any  shortest  path  w  v  to  produce  a  shortest  path  from  u 
to  v.  We  are  assured  that,  other  than  w,  no  vertex  can  appeal-  in  both  paths  px 
and  p2.  Why?  Suppose  that  some  vertex  x  ^  w  appears  in  both  px  and  p2,  so  that 
we  can  decompose  px  as  u  x  ^  w  and  p2  as  w  x  v.  By  the  optimal 
substructure  of  this  problem,  path  p  has  as  many  edges  as  pt  and  p2  together;  let’s 
say  that  p  has  e  edges.  Now  let  us  construct  a  path  p'  =  u  x  v  from  u  to  v. 
Because  we  have  excised  the  paths  from  x  to  w  and  from  w  to  x,  each  of  which 
contains  at  least  one  edge,  path  p'  contains  at  most  e  —  2  edges,  which  contradicts 
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the  assumption  that  p  is  a  shortest  path.  Thus,  we  are  assured  that  the  subproblems 
for  the  shortest-path  problem  are  independent. 

Both  problems  examined  in  Sections  15.1  and  15.2  have  independent  subprob¬ 
lems.  In  matrix-chain  multiplication,  the  subproblems  are  multiplying  subchains 
AjAj+i  ■  ■  ■  Ag  and  Ak+\Ak+2  ■■■  Aj.  These  subchains  are  disjoint,  so  that  no  ma¬ 
trix  could  possibly  be  included  in  both  of  them.  In  rod  cutting,  to  determine  the 
best  way  to  cut  up  a  rod  of  length  n ,  we  look  at  the  best  ways  of  cutting  up  rods 
of  length  i  for  i  =0,  \  , ,n  —  1.  Because  an  optimal  solution  to  the  length-/? 
problem  includes  just  one  of  these  subproblem  solutions  (after  we  have  cut  off  the 
first  piece),  independence  of  subproblems  is  not  an  issue. 

Overlapping  subproblems 

The  second  ingredient  that  an  optimization  problem  must  have  for  dynamic  pro¬ 
gramming  to  apply  is  that  the  space  of  subproblems  must  be  “small”  in  the  sense 
that  a  recursive  algorithm  for  the  problem  solves  the  same  subproblems  over  and 
over,  rather  than  always  generating  new  subproblems.  Typically,  the  total  number 
of  distinct  subproblems  is  a  polynomial  in  the  input  size.  When  a  recursive  algo¬ 
rithm  revisits  the  same  problem  repeatedly,  we  say  that  the  optimization  problem 
has  overlapping  subproblems ,4  In  contrast,  a  problem  for  which  a  divide-and- 
conquer  approach  is  suitable  usually  generates  brand-new  problems  at  each  step 
of  the  recursion.  Dynamic-programming  algorithms  typically  take  advantage  of 
overlapping  subproblems  by  solving  each  subproblem  once  and  then  storing  the 
solution  in  a  table  where  it  can  be  looked  up  when  needed,  using  constant  time  per 
lookup. 

In  Section  15.1,  we  briefly  examined  how  a  recursive  solution  to  rod  cut¬ 
ting  makes  exponentially  many  calls  to  find  solutions  of  smaller  subproblems. 
Our  dynamic-programming  solution  takes  an  exponential-time  recursive  algorithm 
down  to  quadratic  time. 

To  illustrate  the  overlapping-subproblems  property  in  greater  detail,  let  us  re¬ 
examine  the  matrix-chain  multiplication  problem.  Referring  back  to  Figure  15.5, 
observe  that  Matrix-Chain-Order  repeatedly  looks  up  the  solution  to  subprob¬ 
lems  in  lower  rows  when  solving  subproblems  in  higher  rows.  For  example,  it 
references  entry  m  [3 , 4]  four  times:  during  the  computations  of  m[2,4],  m [  1 , 4] , 


4  It  may  seem  strange  that  dynamic  programming  relies  on  subproblems  being  both  independent 
and  overlapping.  Although  these  requirements  may  sound  contradictory,  they  describe  two  different 
notions,  rather  than  two  points  on  the  same  axis.  Two  subproblems  of  the  same  problem  are  inde 
pendent  if  they  do  not  share  resources.  Two  subproblems  are  overlapping  if  they  are  really  the  same 
subproblem  that  occurs  as  a  subproblem  of  different  problems. 
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Figure  15.7  The  recursion  tree  for  the  computation  of  Recursive  Matrix  Chain(/>,  1, 4). 
Each  node  contains  the  parameters  i  and  j .  The  computations  performed  in  a  shaded  subtree  are 
replaced  by  a  single  table  lookup  in  MEMOIZED  Matrix  CHAIN. 

m [3,  5],  and  m [3,6],  If  vve  were  to  recompute  w[3,  4]  each  time,  rather  than  just 
looking  it  up,  the  running  time  would  increase  dramatically.  To  see  how,  consider 
the  following  (inefficient)  recursive  procedure  that  determines  m[i,  j ].  the  mini¬ 
mum  number  of  scalar  multiplications  needed  to  compute  the  matrix-chain  product 
A/.j  =  AjAi+i  Aj.  The  procedure  is  based  directly  on  the  recurrence  (15.7). 

Recursive-Matrix-Chain  (/?,  j ) 

1  if  /  ==  j 

2  return  0 

3  m[i,y]  =  oo 

4  for  k  =  i  to  j  —  1 

5  q  =  Recursive-Matrix-Chain  (/>,/,/:) 

-I-  Recursive-Matrix-Chain (p.k  +  1  ,j) 

+  Pi—lPk  Pj 

6  if  q<m[i,j] 

7  m[i,j]  =  q 

8  return  m[i,  j ] 

Figure  15.7  shows  the  recursion  tree  produced  by  the  call  Recursive-Matrix- 
Chain(/>,  1,4).  Each  node  is  labeled  by  the  values  of  the  parameters  i  and  j . 
Observe  that  some  pairs  of  values  occur  many  times. 

In  fact,  we  can  show  that  the  time  to  compute  m[\,n]  by  this  recursive  proce¬ 
dure  is  at  least  exponential  in  n.  Let  T(n)  denote  the  time  taken  by  Recursive- 
Matrix-Chain  to  compute  an  optimal  parenthesization  of  a  chain  of  n  matrices. 
Because  the  execution  of  lines  1-2  and  of  lines  6-7  each  take  at  least  unit  time,  as 


386 


Chapter  15  Dynamic  Programming 


does  the  multiplication  in  line  5,  inspection  of  the  procedure  yields  the  recurrence 
T(  1)  >  1  , 

n— 1 

T(n)  >  1  +^2(T(k)  +  T(n  —  k)  +  1)  for  n  >  \  . 

k=  1 

Noting  that  for  i  =  1,2 . n  —  I ,  each  term  T  (i )  appears  once  as  T  (k)  and  once 

as  T(n  —  k),  and  collecting  the  n  —  1  Is  in  the  summation  together  with  the  1  out 
front,  we  can  rewrite  the  recurrence  as 

n  —  1 

T(n)  >2^T{i)  +  n  .  (15.8) 

i=i 

We  shall  prove  that  T{n)  =  Q(2" )  using  the  substitution  method.  Specifi¬ 
cally,  we  shall  show  that  T(n)  >  2"-1  for  all  n  >  1.  The  basis  is  easy,  since 
T(l)  >1  =  2°.  Inductively,  for  77  >  2  we  have 


Tin) 


n  —  1 

>  2  2'  _1  +  n 

i  =  1 
n—2 

-  2  ^  21'  +  77 

1=0 

=  2(2"-1  —  1)  +  77  (by  equation  (A.5)) 
=  2"  -  2  +  n 

>  2n~1  , 


which  completes  the  proof.  Thus,  the  total  amount  of  work  performed  by  the  call 
Recursive-Matrix-Chain  ( p ,  1, 77)  is  at  least  exponential  in  77. 

Compare  this  top-down,  recursive  algorithm  (without  memoization)  with  the 
bottom-up  dynamic-programming  algorithm.  The  latter  is  more  efficient  because 
it  takes  advantage  of  the  overlapping-subproblems  property.  Matrix-chain  mul¬ 
tiplication  has  only  O ( 77 2 )  distinct  subproblems,  and  the  dynamic -programming 
algorithm  solves  each  exactly  once.  The  recursive  algorithm,  on  the  other  hand, 
must  again  solve  each  subproblem  every  time  it  reappears  in  the  recursion  tree. 
Whenever  a  recursion  tree  for  the  natural  recursive  solution  to  a  problem  contains 
the  same  subproblem  repeatedly,  and  the  total  number  of  distinct  subproblems  is 
small,  dynamic  programming  can  improve  efficiency,  sometimes  dramatically. 
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Reconstructing  an  optimal  solution 

As  a  practical  matter,  we  often  store  which  choice  we  made  in  each  subproblem  in 
a  table  so  that  we  do  not  have  to  reconstruct  this  information  from  the  costs  that  we 
stored. 

For  matrix-chain  multiplication,  the  table  .s  [/ ,  j  ]  saves  us  a  significant  amount  of 
work  when  reconstructing  an  optimal  solution.  Suppose  that  we  did  not  maintain 
the  s [i ,  j  ]  table,  having  filled  in  only  the  table  m  [i ,  j  ]  containing  optimal  subprob¬ 
lem  costs.  We  choose  from  among  j  —  i  possibilities  when  we  determine  which 
subproblems  to  use  in  an  optimal  solution  to  parenthesizing  A,A;  +  1  Aj,  and 
j  —  i  is  not  a  constant.  Therefore,  it  would  take  0 ( j  —  i)  =  co(  1 )  time  to  recon¬ 
struct  which  subproblems  we  chose  for  a  solution  to  a  given  problem.  By  storing 
in  s[i,  j]  the  index  of  the  matrix  at  which  we  split  the  product  A,  A, +  i  ■  ■  ■  Aj ,  we 
can  reconstruct  each  choice  in  0(1)  time. 

Memoization 

As  we  saw  for  the  rod-cutting  problem,  there  is  an  alternative  approach  to  dy¬ 
namic  programming  that  often  offers  the  efficiency  of  the  bottom-up  dynamic¬ 
programming  approach  while  maintaining  a  top-down  strategy.  The  idea  is  to 
memoize  the  natural,  but  inefficient,  recursive  algorithm.  As  in  the  bottom-up  ap¬ 
proach,  we  maintain  a  table  with  subproblem  solutions,  but  the  control  structure 
for  filling  in  the  table  is  more  like  the  recursive  algorithm. 

A  memoized  recursive  algorithm  maintains  an  entry  in  a  table  for  the  solution  to 
each  subproblem.  Each  table  entry  initially  contains  a  special  value  to  indicate  that 
the  entry  has  yet  to  be  filled  in.  When  the  subproblem  is  first  encountered  as  the 
recursive  algorithm  unfolds,  its  solution  is  computed  and  then  stored  in  the  table. 
Each  subsequent  time  that  we  encounter  this  subproblem,  we  simply  look  up  the 
value  stored  in  the  table  and  return  it.5 

Here  is  a  memoized  version  of  Recursive-Matrix-Chain.  Note  where  it 
resembles  the  memoized  top-down  method  for  the  rod-cutting  problem. 


5This  approach  presupposes  that  we  know  the  set  of  all  possible  subproblem  parameters  and  that  we 
have  established  the  relationship  between  table  positions  and  subproblems.  Another,  more  general, 
approach  is  to  memoize  by  using  hashing  with  the  subproblem  parameters  as  keys. 
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Memoized-Matrix-Chain(p) 

1  n  =  p.  length  —  1 

2  let  m  [1 . .  n ,  1  . .  77]  be  a  new  table 

3  for  i  =  1  to  n 

4  for  j  =  i  to  n 

5  m [i,  j]  =  00 

6  return  Lookup-Chain (777 ,  p,  1 ,  n) 

LOOKUP-CHAIN(777  ,  p,  i,  j ) 

1  if  m [i,  j]  <00 

2  return  m[i,  j] 

3  if  i  ==  j 

4  m  [7 ,  j  ]  =  0 

5  else  for  k  =  i  to  j  —  1 

6  q  =  Lookup-Chain (/n,  pj,k) 

+  LOOKUP-CHAIN(/77,  p,k  +  1,7')  +  pi-iPkPj 

7  if  ^  <  /n  [7 ,  j  ] 

8  m  [/ ,  7  ]  =  q 

9  return  m  [i ,  j  ] 

The  Memoized-Matrix-Chain  procedure,  like  Matrix-Chain-Order, 
maintains  a  table  777  [1  . .  77 ,  1  . .  77]  of  computed  values  of  777  [7,  j],  the  minimum  num¬ 
ber  of  scalar  multiplications  needed  to  compute  the  matrix  Apj.  Each  table  entry 
initially  contains  the  value  00  to  indicate  that  the  entry  has  yet  to  be  tilled  in.  Upon 
calling  Lookup-Chain (777,  p ,  i,  j),  if  line  1  finds  that  777(7,  j ]  <  00,  then  the  pro¬ 
cedure  simply  returns  the  previously  computed  cost  777(7,  j]  in  line  2.  Otherwise, 
the  cost  is  computed  as  in  Recursive-Matrix-Chain,  stored  in  777(7,7],  and 
returned.  Thus,  Lookup-Chain (777,  p,  i,  j)  always  returns  the  value  of  777(7,7], 
but  it  computes  it  only  upon  the  first  call  of  Lookup-Chain  with  these  specific 
values  of  i  and  j . 

Figure  15.7  illustrates  how  Memoized-Matrix-Chain  saves  time  compared 
with  Recursive-Matrix-Chain.  Shaded  subtrees  represent  values  that  it  looks 
up  rather  than  recomputes. 

Like  the  bottom-up  dynamic -programming  algorithm  Matrix-Chain-Order, 
the  procedure  Memoized-Matrix-Chain  runs  in  0(773)  time.  Line  5  of 
Memoized-Matrix-Chain  executes  0(772)  times.  We  can  categorize  the  calls 
of  Lookup-Chain  into  two  types: 

1.  calls  in  which  777(7, 7]  =  00,  so  that  lines  3-9  execute,  and 

2.  calls  in  which  777(7, 7]  <  00,  so  that  Lookup-Chain  simply  returns  in  line  2. 
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There  are  0(«2)  calls  of  the  first  type,  one  per  table  entry.  All  calls  of  the  sec¬ 
ond  type  are  made  as  recursive  calls  by  calls  of  the  first  type.  Whenever  a  given 
call  of  Lookup-Chain  makes  recursive  calls,  it  makes  0(n)  of  them.  There¬ 
fore,  there  are  0{n3)  calls  of  the  second  type  in  all.  Each  call  of  the  second  type 
takes  0(1)  time,  and  each  call  of  the  first  type  takes  0(n)  time  plus  the  time  spent 
in  its  recursive  calls.  The  total  time,  therefore,  is  0(n3).  Memoization  thus  turns 
an  £2(2")-time  algorithm  into  an  0(n3)-time  algorithm. 

In  summary,  we  can  solve  the  matrix-chain  multiplication  problem  by  either  a 
top-down,  memoized  dynamic-programming  algorithm  or  a  bottom-up  dynamic¬ 
programming  algorithm  in  0(n 3)  time.  Both  methods  take  advantage  of  the 
overlapping-subproblems  property.  There  are  only  0(n2)  distinct  subproblems  in 
total,  and  either  of  these  methods  computes  the  solution  to  each  subproblem  only 
once.  Without  memoization,  the  natural  recursive  algorithm  runs  in  exponential 
time,  since  solved  subproblems  are  repeatedly  solved. 

In  general  practice,  if  all  subproblems  must  be  solved  at  least  once,  a  bottom-up 
dynamic-programming  algorithm  usually  outperforms  the  corresponding  top-down 
memoized  algorithm  by  a  constant  factor,  because  the  bottom-up  algorithm  has  no 
overhead  for  recursion  and  less  overhead  for  maintaining  the  table.  Moreover,  for 
some  problems  we  can  exploit  the  regular  pattern  of  table  accesses  in  the  dynamic¬ 
programming  algorithm  to  reduce  time  or  space  requirements  even  further.  Alter¬ 
natively,  if  some  subproblems  in  the  subproblem  space  need  not  be  solved  at  all, 
the  memoized  solution  has  the  advantage  of  solving  only  those  subproblems  that 
are  definitely  required. 

Exercises 


15.3-1 

Which  is  a  more  efficient  way  to  determine  the  optimal  number  of  multiplications 
in  a  matrix-chain  multiplication  problem:  enumerating  all  the  ways  of  parenthesiz¬ 
ing  the  product  and  computing  the  number  of  multiplications  for  each,  or  running 
Recursive-Matrix-Chain?  Justify  your  answer. 


15.3-2 

Draw  the  recursion  tree  for  the  Merge-Sort  procedure  from  Section  2.3.1  on  an 
array  of  16  elements.  Explain  why  memoization  fails  to  speed  up  a  good  divide- 
and-conquer  algorithm  such  as  Merge-Sort. 


15.3-3 

Consider  a  valiant  of  the  matrix-chain  multiplication  problem  in  which  the  goal  is 
to  parenthesize  the  sequence  of  matrices  so  as  to  maximize,  rather  than  minimize, 
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the  number  of  scalar  multiplications.  Does  this  problem  exhibit  optimal  substruc¬ 
ture? 


15.3-4 

As  stated,  in  dynamic  programming  we  first  solve  the  subproblems  and  then  choose 
which  of  them  to  use  in  an  optimal  solution  to  the  problem.  Professor  Capulet 
claims  that  we  do  not  always  need  to  solve  all  the  subproblems  in  order  to  find  an 
optimal  solution.  She  suggests  that  we  can  find  an  optimal  solution  to  the  matrix- 
chain  multiplication  problem  by  always  choosing  the  matrix  Ag  at  which  to  split 
the  subproduct  AtAi+\  Aj  (by  selecting  k  to  minimize  the  quantity  //,•_] pk pj) 
before  solving  the  subproblems.  Find  an  instance  of  the  matrix-chain  multiplica¬ 
tion  problem  for  which  this  greedy  approach  yields  a  suboptimal  solution. 


15.3- 5 

Suppose  that  in  the  rod-cutting  problem  of  Section  15. 1,  we  also  had  limit  /,  on  the 
number  of  pieces  of  length  i  that  we  are  allowed  to  produce,  for  i  =  1,2 ,...,/?. 
Show  that  the  optimal-substructure  property  described  in  Section  15.1  no  longer 
holds. 

15.3- 6 

Imagine  that  you  wish  to  exchange  one  currency  for  another.  You  realize  that 
instead  of  directly  exchanging  one  currency  for  another,  you  might  be  better  off 
making  a  series  of  trades  through  other  currencies,  winding  up  with  the  currency 
you  want.  Suppose  that  you  can  trade  n  different  currencies,  numbered  1,2,...,/?, 
where  you  start  with  currency  1  and  wish  to  wind  up  with  currency  /?.  You  are 
given,  for  each  pair  of  currencies  i  and  j ,  an  exchange  rate  r,y ,  meaning  that  if 
you  starl  with  d  units  of  currency  i,  you  can  trade  for  d units  of  currency  j . 
A  sequence  of  trades  may  entail  a  commission,  which  depends  on  the  number  of 
trades  you  make.  Let  eg  be  the  commission  that  you  are  charged  when  you  make  k 
trades.  Show  that,  if  eg  =  0  for  all  k  =  1,2,...,//,  then  the  problem  of  finding  the 
best  sequence  of  exchanges  from  currency  1  to  currency  /?  exhibits  optimal  sub¬ 
structure.  Then  show  that  if  commissions  eg  are  arbitrary  values,  then  the  problem 
of  finding  the  best  sequence  of  exchanges  from  currency  1  to  currency  /?  does  not 
necessarily  exhibit  optimal  substructure. 


15.4  Longest  common  subsequence 

Biological  applications  often  need  to  compare  the  DNA  of  two  (or  more)  dif¬ 
ferent  organisms.  A  strand  of  DNA  consists  of  a  string  of  molecules  called 
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bases,  where  the  possible  bases  are  adenine,  guanine,  cytosine,  and  thymine. 
Representing  each  of  these  bases  by  its  initial  letter,  we  can  express  a  strand 
of  DNA  as  a  string  over  the  finite  set  {A,  C,  G,  T}.  (See  Appendix  C  for 
the  definition  of  a  string.)  For  example,  the  DNA  of  one  organism  may  be 
Sj  =  ACCGGTCGAGTGCGCGGAAGCCGGCCGAA,  and  the  DNA  of  another  organ¬ 
ism  may  be  S2  =  GTCGTTCGGAATGCCGTTGCTCTGTAAA  One  reason  to  com¬ 
pare  two  strands  of  DNA  is  to  determine  how  “similar”  the  two  strands  are,  as  some 
measure  of  how  closely  related  the  two  organisms  are.  We  can,  and  do,  define  sim¬ 
ilarity  in  many  different  ways.  For  example,  we  can  say  that  two  DNA  strands  are 
similar  if  one  is  a  substring  of  the  other.  (Chapter  32  explores  algorithms  to  solve 
this  problem.)  In  our  example,  neither  .S',  nor  S2  is  a  substring  of  the  other.  Alter¬ 
natively,  we  could  say  that  two  strands  are  similar  if  the  number  of  changes  needed 
to  turn  one  into  the  other  is  small.  (Problem  15-5  looks  at  this  notion.)  Yet  another 
way  to  measure  the  similarity  of  strands  Si  and  S2  is  by  finding  a  third  strand  S3 
in  which  the  bases  in  S3  appear  in  each  of  Si  and  S2,  these  bases  must  appear 
in  the  same  order,  but  not  necessarily  consecutively.  The  longer  the  strand  S3  we 
can  find,  the  more  similar  Si  and  S2  are.  In  our  example,  the  longest  strand  S3  is 
GTCGTCGGAAGCCGGCCGAA. 

We  formalize  this  last  notion  of  similarity  as  the  longest-common-subsequence 
problem.  A  subsequence  of  a  given  sequence  is  just  the  given  sequence  with  zero  or 
more  elements  left  out.  Formally,  given  a  sequence  X  —  (xi,x2, . . .  ,xm),  another 
sequence  Z  =  (zi,  Z2,  •  •  • ,  Zk)  is  a  subsequence  of  X  if  there  exists  a  strictly 
increasing  sequence  (z  1 ,  i2 , . . . ,  4 )  of  indices  of  X  such  that  for  all  j  =  1,2 , ,1c, 
we  have  re,-.  =  Zj.  For  example,  Z  =  (B,  C,  D,  B)  is  a  subsequence  of  X  = 
(A,  B ,  C,  B ,  D ,  A ,  B)  with  corresponding  index  sequence  (2,  3,  5, 7). 

Given  two  sequences  X  and  Y ,  we  say  that  a  sequence  Z  is  a  common  sub¬ 
sequence  of  X  and  Y  if  Z  is  a  subsequence  of  both  X  and  7.  For  example,  if 
X  =  {A,  B ,  C,  B,  D,  A,  B)  and  7  =  ( B ,  D,  C ,  A,  B ,  A),  the  sequence  ( B ,  C,  A)  is 
a  common  subsequence  of  both  X  and  7.  The  sequence  ( B ,  C,  A)  is  not  a  longest 
common  subsequence  (LCS)  of  X  and  7,  however,  since  it  has  length  3  and  the 
sequence  ( B ,  C,  B ,  A),  which  is  also  common  to  both  X  and  7,  has  length  4.  The 
sequence  ( B ,  C,  B,  A)  is  an  LCS  of  X  and  7,  as  is  the  sequence  (B,  D ,  A ,  B), 
since  X  and  7  have  no  common  subsequence  of  length  5  or  greater. 

In  the  longest-common-subsequence  problem,  we  are  given  two  sequences 
X  =  (jci ,  x2,  . . . ,  xm )  and  7  =  (n,  y2,  . . . ,  y„)  and  wish  to  find  a  maximum- 
length  common  subsequence  of  X  and  7.  This  section  shows  how  to  efficiently 
solve  the  LCS  problem  using  dynamic  programming. 
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Step  1:  Characterizing  a  longest  common  subsequence 

In  a  brute-force  approach  to  solving  the  LCS  problem,  we  would  enumerate  all 
subsequences  of  X  and  check  each  subsequence  to  see  whether  it  is  also  a  subse¬ 
quence  of  F ,  keeping  track  of  the  longest  subsequence  we  find.  Each  subsequence 
of  X  corresponds  to  a  subset  of  the  indices  {l,2,...,m}ofX.  Because  X  has  2m 
subsequences,  this  approach  requires  exponential  time,  making  it  impractical  for 
long  sequences. 

The  LCS  problem  has  an  optimal-substructure  property,  however,  as  the  follow¬ 
ing  theorem  shows.  As  we  shall  see,  the  natural  classes  of  subproblems  corre¬ 
spond  to  pairs  of  “prefixes”  of  the  two  input  sequences.  To  be  precise,  given  a 

sequence  F  =  (xi,x2, _ xm ),  we  define  the  i  th  prefix  of  X ,  for  i  =0,1 _ ,m, 

as  Xj  =  (jci,  x2,  ■  ■  ■ ,  Xj ).  For  example,  if  X  =  ( A ,  B,  C,  B,  D ,  A,  B ),  then 
X4  =  (A,  B,  C,  B)  and  X0  is  the  empty  sequence. 

Theorem  15.1  (Optimal  substructure  of  an  LCS) 

Let  X  =  (jci,  x2,  . . . ,  xm)  and  Y  =  (yq,  y2, . . . ,  y„)  be  sequences,  and  let  Z  = 
(zi,Z2, . . . ,  Zk)  be  any  LCS  of  X  and  Y. 

1.  If  xm  =  yn,  then  Zk  =  xm  =  yn  and  Zk_x  is  an  LCS  of  Xm _t  and  F„_ x. 

2.  If  xm  f  yn ,  then  zk  f  xm  implies  that  Z  is  an  LCS  of  Fm_i  and  Y . 

3.  If  xm  7^  yn,  then  zk  7^  yn  implies  that  Z  is  an  LCS  of  X  and  F„_!. 

Proof  (1)  If  Zk  f  xm,  then  we  could  append  xm  —  y„  to  Z  to  obtain  a  common 
subsequence  of  X  and  Y  of  length  k  +  1 ,  contradicting  the  supposition  that  Z  is 
a  longest  common  subsequence  of  X  and  Y.  Thus,  we  must  have  zk  =  xm  =  yn. 
Now,  the  prefix  Zk~\  is  a  length- (A  —  1)  common  subsequence  of  Am_  |  and  F„_  1. 
We  wish  to  show  that  it  is  an  LCS.  Suppose  for  the  puipose  of  contradiction 
that  there  exists  a  common  subsequence  W  of  Am_i  and  with  length  greater 
than  k  —  1 .  Then,  appending  xm  =  y„  to  W  produces  a  common  subsequence  of 
X  and  F  whose  length  is  greater  than  k,  which  is  a  contradiction. 

(2)  If  zk  f  xm,  then  Z  is  a  common  subsequence  of  A„,_i  and  F.  If  there  were  a 
common  subsequence  W  of  A„,_|  and  F  with  length  greater  than  k,  then  W  would 
also  be  a  common  subsequence  of  Xm  and  F ,  contradicting  the  assumption  that  Z 
is  an  LCS  of  X  and  F. 

(3)  The  proof  is  symmetric  to  (2).  ■ 

The  way  that  Theorem  15.1  characterizes  longest  common  subsequences  tells 
us  that  an  LCS  of  two  sequences  contains  within  it  an  LCS  of  prefixes  of  the  two 
sequences.  Thus,  the  LCS  problem  has  an  optimal-substructure  property.  A  recur- 
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sive  solution  also  has  the  overlapping-subproblems  property,  as  we  shall  see  in  a 
moment. 


Step  2:  A  recursive  solution 


Theorem  15.1  implies  that  we  should  examine  either  one  or  two  subproblems  when 
finding  an  LCS  of  X  =  (xx,  x2,  . . . ,  xm)  and  7  =  (yi,y2,  ■■■,  yn)-  If  xm  =  yn, 
we  must  find  an  LCS  of  Xm-i  and  7„_| .  Appending  xm  =  yn  to  this  LCS  yields 
an  LCS  of  X  and  7.  If  xm  ^  yn,  then  we  must  solve  two  subproblems:  finding  an 
LCS  of  Xm-\  and  Y  and  finding  an  LCS  of  X  and  7„_, .  Whichever  of  these  two 
LCSs  is  longer  is  an  LCS  of  X  and  Y.  Because  these  cases  exhaust  all  possibilities, 
we  know  that  one  of  the  optimal  subproblem  solutions  must  appear  within  an  LCS 
of  X  and  Y. 

We  can  readily  see  the  overlapping-subproblems  property  in  the  LCS  problem. 
To  find  an  LCS  of  X  and  7,  we  may  need  to  find  the  LCSs  of  X  and  7„_!  and 
of  Am_|  and  7.  But  each  of  these  subproblems  has  the  subsubproblem  of  finding 
an  LCS  of  Am_x  and  7„_1.  Many  other  subproblems  share  subsubproblems. 

As  in  the  matrix-chain  multiplication  problem,  our  recursive  solution  to  the  LCS 
problem  involves  establishing  a  recurrence  for  the  value  of  an  optimal  solution. 
Let  us  define  c  [i ,  j  ]  to  be  the  length  of  an  LCS  of  the  sequences  Xt  and  7y .  If 
either  i  =  0  or  j  =  0,  one  of  the  sequences  has  length  0,  and  so  the  LCS  has 
length  0.  The  optimal  substructure  of  the  LCS  problem  gives  the  recursive  formula 


c[i,j] 


0  if  i  =  0  or  j  =  0  , 

c[i  —  1,  j  —  1]  +  1  if  i,  y  >  0  and  x,-  =  yj  , 

max(c[z,  j  —  1],  c[i  —  !,_/])  if  i,  j  >  0  and  x,  /  >7  . 


(15.9) 


Observe  that  in  this  recursive  formulation,  a  condition  in  the  problem  restricts 
which  subproblems  we  may  consider.  When  x,-  =  yj ,  we  can  and  should  consider 
the  subproblem  of  finding  an  LCS  of  A,_|  and  7y_| .  Otherwise,  we  instead  con¬ 
sider  the  two  subproblems  of  finding  an  LCS  of  X,  and  7y_|  and  of  A/_,  and  Yj.  In 
the  previous  dynamic -programming  algorithms  we  have  examined— for  rod  cutting 
and  matrix-chain  multiplication— we  ruled  out  no  subproblems  due  to  conditions 
in  the  problem.  Finding  an  LCS  is  not  the  only  dynamic-programming  algorithm 
that  rules  out  subproblems  based  on  conditions  in  the  problem.  For  example,  the 
edit-distance  problem  (see  Problem  15-5)  has  this  characteristic. 


Step  3:  Computing  the  length  of  an  LCS 

Based  on  equation  (15.9),  we  could  easily  write  an  exponential-time  recursive  al¬ 
gorithm  to  compute  the  length  of  an  LCS  of  two  sequences.  Since  the  LCS  problem 
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has  only  &(tnn)  distinct  subproblems,  however,  we  can  use  dynamic  programming 
to  compute  the  solutions  bottom  up. 

Procedure  LCS-Length  takes  two  sequences  X  =  (xi,  x2,  . ..,  xm)  and 
Y  =  {yi,y2,  ■  ■  ■ ,  yn)  as  inputs.  It  stores  the  c [i ,  j]  values  in  a  table  c[0 . .  m,  0 . .  n], 
and  it  computes  the  entries  in  row-major  order.  (That  is,  the  procedure  fills  in  the 
first  row  of  c  from  left  to  right,  then  the  second  row,  and  so  on.)  The  procedure  also 
maintains  the  table  b[\  . .  m,  1  . .  n\  to  help  us  construct  an  optimal  solution.  Intu¬ 
itively,  b[i,j]  points  to  the  table  entry  corresponding  to  the  optimal  subproblem 
solution  chosen  when  computing  c[i,  j ].  The  procedure  returns  the  b  and  c  tables; 
c[mji ]  contains  the  length  of  an  LCS  of  X  and  Y. 


LCS-Length  (X,Y) 

1  m  =  X. length 

2  n  =  Y.  length 

3  let  b[\ . .  m,  1 . .  n\  and  c[0 . .  m,  0 . .  /;]  be  new  tables 

4  for  i  =  1  to  m 

5  c  [i ,  0]  =  0 

6  for  j  =  0  to  n 

7  c[0J]  =  0 

8  for  i  =  1  to  m 


9 

for  j  =  1  to  n 

10 

if  x,  ==  yj 

11 

c  D  >  J  ] 

=  c[i  — 

1-7 

12 

WJ] 

—  “K” 

13 

elseif  c  [i  — 

1-7']  >  < 

c[i,j 

14 

c[i,j] 

=  c  [i  — 

1-7'] 

15 

WJ] 

16 

else  c  [i ,  j  ] 

=  c[i,j 

-1] 

17 

b[i,j] 

_  44^ _ ?? 

18 

return  c  and  b 

Figure  15.8  shows  the  tables  produced  by  LCS-Length  on  the  sequences  X  = 
(A,  B,  C,  B,  D ,  A ,  B)  and  Y  =  ( B ,  D ,  C,  A,  B,  A).  The  running  time  of  the 
procedure  is  @(mn),  since  each  table  entry  takes  0(1)  time  to  compute. 

Step  4:  Constructing  an  LCS 

The  b  table  returned  by  LCS-Length  enables  us  to  quickly  construct  an  LCS  of 
X  =  (xi,x2, . . . ,  xm)  and  Y  =  (y\,y2, . . . ,  y„).  We  simply  begin  at  b[m,n]  and 
trace  through  the  table  by  following  the  arrows.  Whenever  we  encounter  a  i‘\”  in 
entry  b[i,  j],  it  implies  that  x,-  =  Vj  is  an  element  of  the  LCS  that  LCS-Length 
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Figure  15.8  The  c  and  b  tables  computed  by  LCS  Length  on  the  sequences  X  =  (A.  B.C,  B . 
D.A.B)  and  Y  =  ( B .  D,  C,  A,  B,  A).  The  square  in  row  i  and  column  j  contains  the  value  of  c[i,  j\ 
and  the  appropriate  arrow  for  the  value  of  b[i,  j].  The  entry  4  in  c[7, 6]  the  lower  right  hand  comer 
of  the  table  is  the  length  of  an  LCS  ( B .  C,  B,  A)  of  X  and  Y .  For  i.j  >  0,  entry  c[i,  j ]  depends 
only  on  whether  Xj  =  y>  and  the  values  in  entries  c[i  —  1 ,  y  ],  c[t,  j  —  1],  and  c[i  —  1  ,y  —  1],  which 
are  computed  before  c[; .  j ].  To  reconstruct  the  elements  of  an  LCS,  follow  the  b[i,j]  arrows  from 
the  lower  right  hand  comer;  the  sequence  is  shaded.  Each  “\”  on  the  shaded  sequence  corresponds 
to  an  entry  (highlighted)  for  which  Xj  =  yj  is  a  member  of  an  LCS. 

found.  With  this  method,  we  encounter  the  elements  of  this  LCS  in  reverse  order. 
The  following  recursive  procedure  prints  out  an  LCS  of  A'  and  Y  in  the  proper, 
forward  order.  The  initial  call  is  Print-LCS(6,  X.  X. length,  Y. length). 

PRINT-LCS  (b,X,i,j) 

1  if  i  ==  0  or  j  ==  0 

2  return 

3  if  MU]==“V 

4  Print-LCS(6,  X,  i  —  1,7  —  1) 

5  print  xt 

6  elseif  b[i,  j]  ==“f” 

7  Print-LCS(A,  A',  /  —  l,y) 

8  else  Print-LCS(6,  X,  i,j  -  1) 

For  the  b  table  in  Figure  15.8,  this  procedure  prints  BCBA.  The  procedure  takes 
time  0{m  +  n),  since  it  decrements  at  least  one  of  i  and  j  in  each  recursive  call. 
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Improving  the  code 

Once  you  have  developed  an  algorithm,  you  will  often  find  that  you  can  improve 
on  the  time  or  space  it  uses.  Some  changes  can  simplify  the  code  and  improve 
constant  factors  but  otherwise  yield  no  asymptotic  improvement  in  performance. 
Others  can  yield  substantial  asymptotic  savings  in  time  and  space. 

In  the  LCS  algorithm,  for  example,  we  can  eliminate  the  b  table  altogether.  Each 
c[i,  j ]  entry  depends  on  only  three  other  c  table  entries:  c[i  —  1,  j  —  1],  c[i  —  1,  j ] , 
and  c[i,  j  —  1  ] .  Given  the  value  of  c[i,  j],  we  can  determine  in  0(  1)  time  which  of 
these  three  values  was  used  to  compute  c[i,  j],  without  inspecting  table  b.  Thus,  we 
can  reconstruct  an  LCS  in  0(m+n)  time  using  aprocedure  similar  to  Print-LCS. 
(Exercise  15.4-2  asks  you  to  give  the  pseudocode.)  Although  we  save  Q(mn)  space 
by  this  method,  the  auxiliary  space  requirement  for  computing  an  LCS  does  not 
asymptotically  decrease,  since  we  need  ©(inn)  space  for  the  c  table  anyway. 

We  can,  however,  reduce  the  asymptotic  space  requirements  for  LCS-Length, 
since  it  needs  only  two  rows  of  table  c  at  a  time:  the  row  being  computed  and  the 
previous  row.  (In  fact,  as  Exercise  15.4-4  asks  you  to  show,  we  can  use  only  slightly 
more  than  the  space  for  one  row  of  c  to  compute  the  length  of  an  LCS.)  This 
improvement  works  if  we  need  only  the  length  of  an  LCS ;  if  we  need  to  reconstruct 
the  elements  of  an  LCS,  the  smaller  table  does  not  keep  enough  information  to 
retrace  our  steps  in  0(m  +  n)  time. 

Exercises 


15.4-1 

Determine  an  LCS  of  ( 1 , 0, 0,  1 , 0,  1 , 0, 1 )  and  (0,  1 , 0,  1 , 1 , 0,  1 ,  1 , 0) . 


15.4-2 

Give  pseudocode  to  reconstruct  an  LCS  from  the  completed  c  table  and  the  original 
sequences  X  —  (xi,  x2,  . . . ,  xm)  and  Y  =  (ji,  j2,  •  ■  ■ ,  Jn)  in  0(m  +  n)  time, 
without  using  the  b  table. 


15.4- 3 

Give  a  memoized  version  of  LCS-Length  that  runs  in  0(mri)  time. 

15.4- 4 

Show  how  to  compute  the  length  of  an  LCS  using  only  2 -min (m.n)  entries  in  the  c 
table  plus  (9(1)  additional  space.  Then  show  how  to  do  the  same  thing,  but  using 
min (m,n)  entries  plus  0(1)  additional  space. 
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15.4- 5 

Give  an  (9 (n 2 ) -time  algorithm  to  find  the  longest  monotonically  increasing  subse¬ 
quence  of  a  sequence  of  n  numbers. 

15.4- 6  * 

Give  an  0(n  lg  77) -time  algorithm  to  find  the  longest  monotonically  increasing  sub¬ 
sequence  of  a  sequence  of  n  numbers.  {Hint:  Observe  that  the  last  element  of  a 
candidate  subsequence  of  length  i  is  at  least  as  large  as  the  last  element  of  a  can¬ 
didate  subsequence  of  length  i  —  1 .  Maintain  candidate  subsequences  by  linking 
them  through  the  input  sequence.) 
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Suppose  that  we  are  designing  a  program  to  translate  text  from  English  to  French. 
For  each  occurrence  of  each  English  word  in  the  text,  we  need  to  look  up  its  French 
equivalent.  We  could  perform  these  lookup  operations  by  building  a  binary  search 
tree  with  n  English  words  as  keys  and  their  French  equivalents  as  satellite  data. 
Because  we  will  search  the  tree  for  each  individual  word  in  the  text,  we  want  the 
total  time  spent  searching  to  be  as  low  as  possible.  We  could  ensure  an  0(lgn) 
search  time  per  occurrence  by  using  a  red-black  tree  or  any  other  balanced  binary 
search  tree.  Words  appear  with  different  frequencies,  however,  and  a  frequently 
used  word  such  as  the  may  appear  far  from  the  root  while  a  rarely  used  word  such 
as  machicolation  appears  near  the  root.  Such  an  organization  would  slow  down  the 
translation,  since  the  number  of  nodes  visited  when  searching  for  a  key  in  a  binary 
search  tree  equals  one  plus  the  depth  of  the  node  containing  the  key.  We  want 
words  that  occur  frequently  in  the  text  to  be  placed  nearer  the  root.6  Moreover, 
some  words  in  the  text  might  have  no  French  translation,7  and  such  words  would 
not  appear  in  the  binary  search  tree  at  all.  How  do  we  organize  a  binary  search  tree 
so  as  to  minimize  the  number  of  nodes  visited  in  all  searches,  given  that  we  know 
how  often  each  word  occurs? 

What  we  need  is  known  as  an  optimal  binary  search  tree.  Formally,  we  are 
given  a  sequence  K  —  {k\ ,  k2.  . . . ,  kn)  of  n  distinct  keys  in  sorted  order  (so  that 
k  ]  <k2  <  ■  ■  ■  <  kn),  and  we  wish  to  build  a  binary  search  tree  from  these  keys. 
For  each  key  k, ,  we  have  a  probability  p,  that  a  search  will  be  for  k,.  Some 
searches  may  be  for  values  not  in  K,  and  so  we  also  have  77  +  1  “dummy  keys” 


6If  the  subject  of  the  text  is  castle  architecture,  we  might  want  machicolation  to  appear  near  the  root. 

7Yes,  machicolation  has  a  French  counterpart:  machicoulis. 
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(a) 


Figure  15.9  Two  binary  search  trees  for  a  set  of  n  =5  keys  with  the  following  probabilities: 
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(a)  A  binary  search  tree  with  expected  search  cost  2.80.  (b)  A  binary  search  tree  with  expected  search 
cost  2.75.  This  tree  is  optimal. 

do,  d\ ,  d2,  ■  . . ,  dn  representing  values  not  in  K.  In  particular,  d0  represents  all  val¬ 
ues  less  than  k\,  dn  represents  all  values  greater  than  kn,  and  for  /  =  1,2,...,  n—  1, 
the  dummy  key  dj  represents  all  values  between  kj  and  ki+l.  For  each  dummy 
key  dj,  we  have  a  probability  <7,  that  a  search  will  correspond  to  dj.  Figure  15.9 
shows  two  binary  search  trees  for  a  set  of  n  =  5  keys.  Each  key  kj  is  an  internal 
node,  and  each  dummy  key  dj  is  a  leaf.  Every  search  is  either  successful  (finding 
some  key  kj)  or  unsuccessful  (finding  some  dummy  key  dj),  and  so  we  have 

n  n 

=  1  •  05.10) 

1=1  1  = 0 

Because  we  have  probabilities  of  searches  for  each  key  and  each  dummy  key, 
we  can  determine  the  expected  cost  of  a  search  in  a  given  binary  search  tree  T .  Let 
us  assume  that  the  actual  cost  of  a  search  equals  the  number  of  nodes  examined, 
i.e.,  the  depth  of  the  node  found  by  the  search  in  T ,  plus  1.  Then  the  expected  cost 
of  a  search  in  T  is 

n  n 

E  [search  cost  in  T\  =  ^(depth7-(/c,  )  +  1)  ■  /?,  4-  (depth T{dj)  +  1)  ■  qf 
1=1  1=0 

n  n 

=  1  +  ^  depthy-  (A:, )  •  p,  +  ^depth  T(dj)-qi,  (15.11) 

1=1  1=0 
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where  depth  T  denotes  a  node’s  depth  in  the  tree  T .  The  last  equality  follows  from 
equation  (15.10).  In  Figure  15.9(a),  we  can  calculate  the  expected  search  cost  node 
by  node: 


node 

depth 

probability 

contribution 

fcl 

1 

0.15 

0.30 

^2 

0 

0.10 

0.10 

k3 

2 

0.05 

0.15 

k  4 

1 

0.10 

0.20 

&5 

2 

0.20 

0.60 

do 

2 

0.05 

0.15 

d  i 

2 

0.10 

0.30 

d2 

3 

0.05 

0.20 

d?> 

3 

0.05 

0.20 

d4 

3 

0.05 

0.20 

ds 

3 

0.10 

0.40 

Total 

2.80 

For  a  given  set  of  probabilities,  we  wish  to  construct  a  binary  search  tree  whose 
expected  search  cost  is  smallest.  We  call  such  a  tree  an  optimal  binary  search  tree. 
Figure  15.9(b)  shows  an  optimal  binary  search  tree  for  the  probabilities  given  in 
the  figure  caption;  its  expected  cost  is  2.75.  This  example  shows  that  an  optimal 
binary  search  tree  is  not  necessarily  a  tree  whose  overall  height  is  smallest.  Nor 
can  we  necessarily  construct  an  optimal  binary  search  tree  by  always  putting  the 
key  with  the  greatest  probability  at  the  root.  Here,  key  k5  has  the  greatest  search 
probability  of  any  key,  yet  the  root  of  the  optimal  binary  search  tree  shown  is  k2. 
(The  lowest  expected  cost  of  any  binary  search  tree  with  k5  at  the  root  is  2.85.) 

As  with  matrix-chain  multiplication,  exhaustive  checking  of  all  possibilities  fails 
to  yield  an  efficient  algorithm.  We  can  label  the  nodes  of  any  //-node  binary  tree 

with  the  keys  k\.k2 . k„  to  construct  a  binary  search  tree,  and  then  add  in  the 

dummy  keys  as  leaves.  In  Problem  12-4,  we  saw  that  the  number  of  binary  trees 
with  n  nodes  is  Q(4"  /  n3'2).  and  so  we  would  have  to  examine  an  exponential 
number  of  binary  search  trees  in  an  exhaustive  search.  Not  surprisingly,  we  shall 
solve  this  problem  with  dynamic  programming. 


Step  1:  The  structure  of  an  optimal  binary  search  tree 

To  characterize  the  optimal  substructure  of  optimal  binary  search  trees,  we  staid 
with  an  observation  about  subtrees.  Consider  any  subtree  of  a  binary  search  tree. 
It  must  contain  keys  in  a  contiguous  range  kit ...  ,kj,  for  some  !<*<;<  n. 
In  addition,  a  subtree  that  contains  keys  kj  must  also  have  as  its  leaves  the 

dummy  keys  di-\ , ,dj . 

Now  we  can  state  the  optimal  substructure:  if  an  optimal  binary  search  tree  T 
has  a  subtree  T'  containing  keys  ki, . . .  ,kj,  then  this  subtree  T'  must  be  optimal  as 
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well  for  the  subproblem  with  keys  kj, ... ,  kj  and  dummy  keys  dj-i, . . .  ,dj .  The 
usual  cut-and-paste  argument  applies.  If  there  were  a  subtree  T"  whose  expected 
cost  is  lower  than  that  of  T',  then  we  could  cut  T'  out  of  T  and  paste  in  T" , 
resulting  in  a  binary  search  tree  of  lower  expected  cost  than  T ,  thus  contradicting 
the  optimality  of  T. 

We  need  to  use  the  optimal  substructure  to  show  that  we  can  construct  an  opti¬ 
mal  solution  to  the  problem  from  optimal  solutions  to  subproblems.  Given  keys 
kit...,kj,  one  of  these  keys,  say  kr  (i  <  r  <  j),  is  the  root  of  an  optimal 
subtree  containing  these  keys.  The  left  subtree  of  the  root  kr  contains  the  keys 
kj, . . . ,  kr- 1  (and  dummy  keys  dj-i, ....  dr- 1),  and  the  right  subtree  contains  the 
keys  kr+i, . . .  ,kj  (and  dummy  keys  dr . d/).  As  long  as  we  examine  all  candi¬ 

date  roots  kr ,  where  i  <  r  <  j ,  and  we  determine  all  optimal  binary  search  trees 
containing  kj, ,  kr-\  and  those  containing  kr+i, . . .  ,kj,  we  are  guaranteed  that 
we  will  find  an  optimal  binary  search  tree. 

There  is  one  detail  worth  noting  about  “empty”  subtrees.  Suppose  that  in  a 
subtree  with  keys  kj, ,  kj ,  we  select  kj  as  the  root.  By  the  above  argument,  kj ’s 
left  subtree  contains  the  keys  kj, ...  ,kj- 1 .  We  interpret  this  sequence  as  containing 
no  keys.  Bear  in  mind,  however,  that  subtrees  also  contain  dummy  keys.  We  adopt 
the  convention  that  a  subtree  containing  keys  kj, . . . ,  k/_!  has  no  actual  keys  but 
does  contain  the  single  dummy  key  dj-\ .  Symmetrically,  if  we  select  kj  as  the  root, 
then  kj ’s  right  subtree  contains  the  keys  kj+i, . . .  ,kj\  this  right  subtree  contains 
no  actual  keys,  but  it  does  contain  the  dummy  key  dj. 

Step  2:  A  recursive  solution 

We  are  ready  to  define  the  value  of  an  optimal  solution  recursively.  We  pick  our 
subproblem  domain  as  finding  an  optimal  binary  search  tree  containing  the  keys 
kj, ...  ,kj,  where  i  >  1,  j  <  n,  and  j  >  i  —  1.  (When  j  =  i  —  1,  there 
are  no  actual  keys;  we  have  just  the  dummy  key  dj-\.)  Let  us  define  e[i,j]  as 
the  expected  cost  of  searching  an  optimal  binary  search  tree  containing  the  keys 
kj, ...  ,kj.  Ultimately,  we  wish  to  compute  e[\ , /?]. 

The  easy  case  occurs  when  j  =  i  —  1.  Then  we  have  just  the  dummy  key  dj-\. 
The  expected  search  cost  is  e[i,i  —  1]  =  qt~\. 

When  j  >  i,  we  need  to  select  a  root  kr  from  among  kj, ... ,  kj  and  then  make  an 
optimal  binary  search  tree  with  keys  kj, ... ,  kr-i  as  its  left  subtree  and  an  optimal 
binary  search  tree  with  keys  kr+i, . . .  ,kj  as  its  right  subtree.  What  happens  to  the 
expected  search  cost  of  a  subtree  when  it  becomes  a  subtree  of  a  node?  The  depth 
of  each  node  in  the  subtree  increases  by  1 .  By  equation  (15.11),  the  expected  search 
cost  of  this  subtree  increases  by  the  sum  of  all  the  probabilities  in  the  subtree.  For 
a  subtree  with  keys  kj . kj,  let  us  denote  this  sum  of  probabilities  as 


15.5  Optimal  binary  search  trees 


401 


w(i,j)  =  J2Pl  +  qi  ■ 


(15.12) 


l=i 


/=*  — 1 


Thus,  if  kr  is  the  root  of  an  optimal  subtree  containing  keys  kt, ...  ,kj,  we  have 
e[i,j]  =  pr  +  C e[i,r  -  1]  +  w(i,r-  1))  +  (e[r  +  1 ,  j]  +  w{r  +  1,;'))  . 

Noting  that 

w(i,  j )  =  w(i,  r  -  1)  +  pr  +  w(r  +  1,  j)  , 
we  rewrite  e[i,  j]  as 

e[i,j]  =  e[i,r  -  1]  +  e[r  +  l,y]  +  w(i,j)  .  (15.13) 

The  recursive  equation  (15.13)  assumes  that  we  know  which  node  kr  to  use  as 
the  root.  We  choose  the  root  that  gives  the  lowest  expected  search  cost,  giving  us 
our  final  recursive  formulation: 


e[i,j]  = 


qi-  i 

min  {e[i,  r 

i<r<j 


if  j  =  i  ~  1  . 

1]  +  e[r  +  1,  j]  +  w(i,  j )}  if  i  <  j  . 


(15.14) 


The  e[i,  j]  values  give  the  expected  search  costs  in  optimal  binary  search  trees. 
To  help  us  keep  track  of  the  structure  of  optimal  binary  search  trees,  we  define 
root[i ,  j],  for  1  <  i  <  j  <  n,  to  be  the  index  r  for  which  kr  is  the  root  of  an 

optimal  binary  search  tree  containing  keys  k, . kj.  Although  we  will  see  how 

to  compute  the  values  of  root  [/ ,  j],  we  leave  the  construction  of  an  optimal  binary 
search  tree  from  these  values  as  Exercise  15.5-1. 


Step  3:  Computing  the  expected  search  cost  of  an  optimal  binary  search  tree 

At  this  point,  you  may  have  noticed  some  similarities  between  our  characterizations 
of  optimal  binary  search  trees  and  matrix-chain  multiplication.  For  both  problem 
domains,  our  subproblems  consist  of  contiguous  index  subranges.  A  direct,  recur¬ 
sive  implementation  of  equation  (15.14)  would  be  as  inefficient  as  a  direct,  recur¬ 
sive  matrix-chain  multiplication  algorithm.  Instead,  we  store  the  e[i,  j ]  values  in  a 
table  e[\  . .  n  +  1, 0 . .  n\.  The  first  index  needs  to  run  to  77  +  1  rather  than  n  because 
in  order  to  have  a  subtree  containing  only  the  dummy  key  dn ,  we  need  to  compute 
and  store  e[n  +  1,  n\.  The  second  index  needs  to  start  from  0  because  in  order  to 
have  a  subtree  containing  only  the  dummy  key  d0,  we  need  to  compute  and  store 
e[l,  0],  We  use  only  the  entries  e [/ .  j]  for  which  j  >  i  —  1.  We  also  use  a  table 
root[i,  j],  for  recording  the  root  of  the  subtree  containing  keys  kj, ...  ,kj.  This 
table  uses  only  the  entries  for  which  1  <  i  <  j  <  n. 

We  will  need  one  other  table  for  efficiency.  Rather  than  compute  the  value 
of  w(i,j)  from  scratch  every  time  we  are  computing  e[i,j]— which  would  take 
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©(j  —  i )  additions— we  store  these  values  in  a  table  w [I  .  .n  +  1,0../;].  For  the 
base  case,  we  compute  w[i,i  —  1]  =  g,_!  for  1  </</;  +  1.  For  j  >  i,  we 
compute 

u/[/',  7]  =  w[i,  j  -  1]  +  pj  +  cjj  .  (15.15) 

Thus,  we  can  compute  the  0(/?2)  values  of  w[i,  j]  in  0(1 )  time  each. 

The  pseudocode  that  follows  takes  as  inputs  the  probabilities  p\, ...  ,pn  and 
q0, ...  ,qn  and  the  size  /;,  and  it  returns  the  tables  e  and  root. 

Optimal-BST(//,  q,  n) 

1  let  e[l . . n  +  1, 0 . . «],  w[l  .  .n  +  1,0..  n\, 

and  root[  1  . .  /;,  1  . .  /;]  be  new  tables 

2  for  i  =  1  to  /;  +  1 

3  e[i,  i  —  1]  =  qt-i 

4  w[i,i  —  1]  =  qt-\ 

5  for  /  =  1  to  /; 

6  for  i  =  1  to  n  —  l  +  1 

7  j  = i +l-\ 

8  e[i,j ]  =  oo 

9  w[i,j]  =  w[i,  j  -  1]  +  pj  +  qj 

10  for  r  =  i  to  j 

11  t  =  e[i,  r  —  1]  +  e[r  +  1,  j]  +  u>[/,  j ] 

12  if  t  <  e[i,  j ] 

13  £>[;',  y]  =  t 

14  /-oo?[;,y]  =  r 

1 5  return  e  and  root 

From  the  description  above  and  the  similarity  to  the  Matrix-Chain-Order  pro¬ 
cedure  in  Section  15.2,  you  should  find  the  operation  of  this  procedure  to  be  fairly 
straightforward.  The  for  loop  of  lines  2-4  initializes  the  values  of  e[i,i  —  1] 
and  w[i,i  —  1].  The  for  loop  of  lines  5-14  then  uses  the  recurrences  (15.14) 
and  (15.15)  to  compute  e[i,  j]  and  w[i,  j]  for  all  \<i  <j  <  n.  In  the  first  itera¬ 
tion,  when  I  =  1,  the  loop  computes  e[i, ;]  and  w[i,  i]  for  i  =  1.2 . n.  The  sec¬ 

ond  iteration,  with  1=2,  computes  e[i,  i  + 1]  and  w[i,  i  + 1]  for  i  =  1,2, ... ,  n  —  1, 
and  so  forth.  The  innermost  for  loop,  in  lines  10-14,  tries  each  candidate  index  r 
to  determine  which  key  kr  to  use  as  the  root  of  an  optimal  binary  search  tree  con¬ 
taining  keys  ki, . . .  ,kj.  This  for  loop  saves  the  current  value  of  the  index  r  in 
root  [i ,  j  ]  whenever  it  finds  a  better  key  to  use  as  the  root. 

Figure  15.10  shows  the  tables  e[i,j],  w[i,j],  and  root[i,j]  computed  by  the 
procedure  Optimal-BST  on  the  key  distribution  shown  in  Figure  15.9.  As  in  the 
matrix-chain  multiplication  example  of  Figure  15.5,  the  tables  are  rotated  to  make 
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Figure  15.10  The  tables  e[i,  j],  w[i,  _/],  and  root[i,j ]  computed  by  Optimal  BST  on  the  key 
distribution  shown  in  Figure  15.9.  The  tables  are  rotated  so  that  the  diagonals  run  horizontally. 

the  diagonals  run  horizontally.  Optimal-BST  computes  the  rows  from  bottom  to 
top  and  from  left  to  right  within  each  row. 

The  Optimal-BST  procedure  takes  ®(n3)  time,  just  like  Matrix-Chain- 
Order.  We  can  easily  see  that  its  running  time  is  0(n3),  since  its  for  loops  are 
nested  three  deep  and  each  loop  index  takes  on  at  most  n  values.  The  loop  indices  in 
Optimal-BST  do  not  have  exactly  the  same  bounds  as  those  in  Matrix-Chain- 
Order,  but  they  are  within  at  most  1  in  all  directions.  Thus,  like  MATRIX-CHAIN- 
Order,  the  Optimal-BST  procedure  takes  Q(n3)  time. 

Exercises 

15.5-1 

Write  pseudocode  for  the  procedure  ConsTRUCT-Optimal-BST  (root)  which, 
given  the  table  root,  outputs  the  structure  of  an  optimal  binary  search  tree.  For  the 
example  in  Figure  15.10,  your  procedure  should  print  out  the  structure 
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k2 

is 

the 

root 

ki 

is 

the 

left  child  of  k2 

do 

is 

the 

left  child  of  k4 

d\ 

is 

the 

right  child  of  k\ 

k5 

is 

the 

right  child  of  k2 

k4 

is 

the 

left  child  of  k5 

kj, 

is 

the 

left  child  of  k4 

d2 

is 

the 

left  child  of  k3 

dj 

is 

the 

right  child  of  k3 

d4 

is 

the 

right  child  of  k4 

ds 

is 

the 

right  child  of  k5 

corresponding  to  the  optimal  binary  search  tree  shown  in  Figure  15.9(b). 

15.5-2 

Determine  the  cost  and  structure  of  an  optimal  binary  search  tree  for  a  set  ofn  =  7 
keys  with  the  following  probabilities: 


i 

0 

1 

2 

3 

4 

5 

6 

7 

pi 

0.04 

0.06 

0.08 

0.02 

0.10 

0.12 

0.14 

9i 

0.06 

0.06 

0.06 

0.06 

0.05 

0.05 

0.05 

0.05 

15.5- 3 

Suppose  that  instead  of  maintaining  the  table  w[i,j],  we  computed  the  value 
of  w(i,  j )  directly  from  equation  (15. 12)  in  line  9  of  Optimal-BST  and  used  this 
computed  value  in  line  1 1 .  How  would  this  change  affect  the  asymptotic  running 
time  of  Optimal-BST? 

15.5- 4  * 

Knuth  [212]  has  shown  that  there  are  always  roots  of  optimal  subtrees  such  that 
root[i,  j  —  1]  <  root[i,  j ]  <  root[i  +  1,  j ]  for  all  1  <  i  <  j  <  n.  Use  this  fact  to 
modify  the  Optimal-BST  procedure  to  run  in  0(n2)  time. 


Problems 


15-1  Longest  simple  path  in  a  directed  acyclic  graph 

Suppose  that  we  are  given  a  directed  acyclic  graph  G  =  (V.  E )  with  real¬ 
valued  edge  weights  and  two  distinguished  vertices  s  and  t.  Describe  a  dynamic¬ 
programming  approach  for  finding  a  longest  weighted  simple  path  from  s  to  t. 
What  does  the  subproblem  graph  look  like?  What  is  the  efficiency  of  your  algo¬ 
rithm? 
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(a)  (b) 


Figure  15.11  Seven  points  in  the  plane,  shown  on  a  unit  grid,  (a)  The  shortest  closed  tour,  with 
length  approximately  24.89.  This  tour  is  not  bitonic,  (b)  The  shortest  bitonic  tour  for  the  same  set  of 
points.  Its  length  is  approximately  25.58. 

15-2  Longest  palindrome  subsequence 

A  palindrome  is  a  nonempty  string  over  some  alphabet  that  reads  the  same  for¬ 
ward  and  backward.  Examples  of  palindromes  are  all  strings  of  length  1,  civic, 
racecar,  and  aibohphobia  (fear  of  palindromes). 

Give  an  efficient  algorithm  to  find  the  longest  palindrome  that  is  a  subsequence 
of  a  given  input  string.  For  example,  given  the  input  character,  your  algorithm 
should  return  carac.  What  is  the  running  time  of  your  algorithm? 

15-3  Bitonic  euclidean  traveling-salesman  problem 

In  the  euclidean  traveling-salesman  problem ,  we  are  given  a  set  of  n  points  in 
the  plane,  and  we  wish  to  find  the  shortest  closed  tour  that  connects  all  n  points. 
Figure  15.11(a)  shows  the  solution  to  a  7-point  problem.  The  general  problem  is 
NP-hard,  and  its  solution  is  therefore  believed  to  require  more  than  polynomial 
time  (see  Chapter  34). 

J.  F.  Bentley  has  suggested  that  we  simplify  the  problem  by  restricting  our  at¬ 
tention  to  bitonic  tours ,  that  is,  tours  that  staid  at  the  leftmost  point,  go  strictly 
rightward  to  the  rightmost  point,  and  then  go  strictly  leftward  back  to  the  starting 
point.  Figure  15.11(b)  shows  the  shortest  bitonic  tour  of  the  same  7  points.  In  this 
case,  a  polynomial-time  algorithm  is  possible. 

Describe  an  0(/?2)-time  algorithm  for  determining  an  optimal  bitonic  tour.  You 
may  assume  that  no  two  points  have  the  same  x  -coordinate  and  that  all  operations 
on  real  numbers  take  unit  time.  {Hint:  Scan  left  to  right,  maintaining  optimal  pos¬ 
sibilities  for  the  two  pails  of  the  tour.) 

15-4  Printing  neatly 

Consider  the  problem  of  neatly  printing  a  paragraph  with  a  monospaced  font  (all 
characters  having  the  same  width)  on  a  printer.  The  input  text  is  a  sequence  of  n 
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words  of  lengths  4, 12, . . . ,  measured  in  characters.  We  want  to  print  this  para¬ 
graph  neatly  on  a  number  of  lines  that  hold  a  maximum  of  M  characters  each.  Our 
criterion  of  “neatness”  is  as  follows.  If  a  given  line  contains  words  i  through  j , 
where  i  <  j ,  and  we  leave  exactly  one  space  between  words,  the  number  of  extra 
space  characters  at  the  end  of  the  line  is  M  —  j  +  i  —  4,  which  must  be 

nonnegative  so  that  the  words  fit  on  the  line.  We  wish  to  minimize  the  sum,  over 
all  lines  except  the  last,  of  the  cubes  of  the  numbers  of  extra  space  characters  at  the 
ends  of  lines.  Give  a  dynamic -programming  algorithm  to  print  a  paragraph  of  n 
words  neatly  on  a  printer.  Analyze  the  running  time  and  space  requirements  of 
your  algorithm. 

15-5  Edit  distance 

In  order  to  transform  one  source  string  of  text  x[l . .  m]  to  a  target  string  y[\  . .  n], 
we  can  perform  various  transformation  operations.  Our  goal  is,  given  x  and  y, 
to  produce  a  series  of  transformations  that  change  x  to  y.  We  use  an  ar¬ 
ray  assumed  to  be  large  enough  to  hold  all  the  characters  it  will  need— to  hold 
the  intermediate  results.  Initially,  z  is  empty,  and  at  termination,  we  should  have 
z[j]  =  >’[/]  for  j  =  1,2 We  maintain  current  indices  i  into  x  and  j  into  z, 
and  the  operations  are  allowed  to  alter  z  and  these  indices.  Initially,  i  =  j  =  1 . 
We  are  required  to  examine  every  character  in  x  during  the  transformation,  which 
means  that  at  the  end  of  the  sequence  of  transformation  operations,  we  must  have 
i  =  m  +  1. 

We  may  choose  from  among  six  transformation  operations: 

Copy  a  character  from  x  to  z  by  setting  z[j]  =  x[i\  and  then  incrementing  both  i 
and  j .  This  operation  examines  x[i]. 

Replace  a  character  from  x  by  another  character  c,  by  setting  z[j]  =  c,  and  then 
incrementing  both  i  and  j .  This  operation  examines  x  [i  J . 

Delete  a  character  from  x  by  incrementing  i  but  leaving  j  alone.  This  operation 
examines  x[i]. 

Insert  the  character  c  into  z  by  setting  z[j]  =  c  and  then  incrementing  j ,  but 
leaving  i  alone.  This  operation  examines  no  characters  of  x. 

Twiddle  (i.e.,  exchange)  the  next  two  characters  by  copying  them  from  x  to  z  but 
in  the  opposite  order;  we  do  so  by  setting  z[j]  =  x[i  +  1]  and  z.[j  +  1]  =  x[i] 
and  then  setting  i  =  i  +  2  and  j  —  j  +  2.  This  operation  examines  x[i] 
and  x[i  +  1]. 

Kill  the  remainder  of  x  by  setting  i  =  m  +  1 .  This  operation  examines  all  char¬ 
acters  in  x  that  have  not  yet  been  examined.  This  operation,  if  performed,  must 
be  the  final  operation. 
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As  an  example,  one  way  to  transform  the  source  string  algorithm  to  the  target 
string  altruistic  is  to  use  the  following  sequence  of  operations,  where  the 
underlined  characters  are  x  [i  ]  and  z  [j  ]  after  the  operation: 


Operation 

X 

2 

initial  strings 

algorithm 

copy 

algorithm 

a_ 

copy 

algorithm 

al_ 

replace  by  t 

algorithm 

alt_ 

delete 

algorithm 

alt 

copy 

algorithm 

altr_ 

insert  u 

algorithm 

altru_ 

insert  i 

algorithm 

altrui_ 

insert  s 

algorithm 

altruis_ 

twiddle 

algorithm 

altruisti_ 

insert  c 

algorithm 

altruistic 

kill 

algorithm_ 

altruistic 

Note  that  there  are  several  other  sequences  of  transformation  operations  that  trans¬ 
form  algorithm  to  altruistic. 

Each  of  the  transformation  operations  has  an  associated  cost.  The  cost  of  an 
operation  depends  on  the  specific  application,  but  we  assume  that  each  operation’s 
cost  is  a  constant  that  is  known  to  us.  We  also  assume  that  the  individual  costs  of 
the  copy  and  replace  operations  are  less  than  the  combined  costs  of  the  delete  and 
insert  operations;  otherwise,  the  copy  and  replace  operations  would  not  be  used. 
The  cost  of  a  given  sequence  of  transformation  operations  is  the  sum  of  the  costs 
of  the  individual  operations  in  the  sequence.  For  the  sequence  above,  the  cost  of 
transforming  algorithm  to  altruistic  is 

(3  ■  cost(copy))  +  cost(replace)  +  cost(delete)  +  (4  •  cost(insert)) 

+  cost(twiddle)  +  cost(kill)  . 

a.  Given  two  sequences  x[l  . .  m\  and  y  [1  . .  n\  and  set  of  transformation-operation 
costs,  the  edit  distance  from  x  to  y  is  the  cost  of  the  least  expensive  operation 
sequence  that  transforms  x  to  y.  Describe  a  dynamic-programming  algorithm 
that  finds  the  edit  distance  from  x[l . .  m]  to  y  [1 . .  //]  and  prints  an  optimal  op¬ 
eration  sequence.  Analyze  the  running  time  and  space  requirements  of  your 
algorithm. 

The  edit-distance  problem  generalizes  the  problem  of  aligning  two  DNA  sequences 
(see,  for  example,  Setubal  and  Meidanis  [310,  Section  3.2]).  There  are  several 
methods  for  measuring  the  similarity  of  two  DNA  sequences  by  aligning  them. 
One  such  method  to  align  two  sequences  x  and  y  consists  of  inserting  spaces  at 
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arbitrary  locations  in  the  two  sequences  (including  at  either  end)  so  that  the  result¬ 
ing  sequences  x'  and  y'  have  the  same  length  but  do  not  have  a  space  in  the  same 
position  (i.e.,  for  no  position  j  are  both  x'[j\  and  y'[j]  a  space).  Then  we  assign  a 
“score”  to  each  position.  Position  j  receives  a  score  as  follows: 

•  +1  if  x'[j]  =  y'[j]  and  neither  is  a  space, 

•  —1  if  x'[j  ]  7^  y'[j  ]  and  neither  is  a  space, 

•  —2  if  either  x'[j  ]  or  y'[j  ]  is  a  space. 

The  score  for  the  alignment  is  the  sum  of  the  scores  of  the  individual  positions.  For 
example,  given  the  sequences  x  =  GATCGGCAT  and  y  =  CAATGTGAATC,  one 
alignment  is 

G  ATCG  GCAT 
CAAT  GTGAATC 
—*++*+*+—++* 

A  +  under  a  position  indicates  a  score  of  + 1  for  that  position,  a  -  indicates  a  score 
of  —  1 ,  and  a  *  indicates  a  score  of  —2,  so  that  this  alignment  has  a  total  score  of 
6  •  1  -  2  ■  1  -  4  ■  2  =  -4. 

b.  Explain  how  to  cast  the  problem  of  finding  an  optimal  alignment  as  an  edit 
distance  problem  using  a  subset  of  the  transformation  operations  copy,  replace, 
delete,  insert,  twiddle,  and  kill. 

15-6  Planning  a  company  party 

Professor  Stewart  is  consulting  for  the  president  of  a  corporation  that  is  planning 
a  company  party.  The  company  has  a  hierarchical  structure;  that  is,  the  supervisor 
relation  forms  a  tree  rooted  at  the  president.  The  personnel  office  has  ranked  each 
employee  with  a  conviviality  rating,  which  is  a  real  number.  In  order  to  make  the 
party  fun  for  all  attendees,  the  president  does  not  want  both  an  employee  and  his 
or  her  immediate  supervisor  to  attend. 

Professor  Stewart  is  given  the  tree  that  describes  the  structure  of  the  corporation, 
using  the  left-child,  right-sibling  representation  described  in  Section  10.4.  Each 
node  of  the  tree  holds,  in  addition  to  the  pointers,  the  name  of  an  employee  and 
that  employee’s  conviviality  ranking.  Describe  an  algorithm  to  make  up  a  guest 
list  that  maximizes  the  sum  of  the  conviviality  ratings  of  the  guests.  Analyze  the 
running  time  of  your  algorithm. 

15-7  Viterbi  algorithm 

We  can  use  dynamic  programming  on  a  directed  graph  G  =  (V,  E)  for  speech 
recognition.  Each  edge  (u,v)  €  E  is  labeled  with  a  sound  a(u,v )  from  a  fi¬ 
nite  set  S  of  sounds.  The  labeled  graph  is  a  formal  model  of  a  person  speaking 
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a  restricted  language.  Each  path  in  the  graph  starting  from  a  distinguished  ver¬ 
tex  v0  €  V  corresponds  to  a  possible  sequence  of  sounds  produced  by  the  model. 
We  define  the  label  of  a  directed  path  to  be  the  concatenation  of  the  labels  of  the 
edges  on  that  path. 

a.  Describe  an  efficient  algorithm  that,  given  an  edge-labeled  graph  G  with  dis¬ 
tinguished  vertex  v0  and  a  sequence  s  =  (o\,  a2,  .  ■  . ,  o>)  of  sounds  from  E, 
returns  a  path  in  G  that  begins  at  v0  and  has  s  as  its  label,  if  any  such  path  exists. 
Otherwise,  the  algorithm  should  return  NO-SUCH-PATH.  Analyze  the  running 
time  of  your  algorithm.  (Hint:  You  may  find  concepts  from  Chapter  22  useful.) 

Now,  suppose  that  every  edge  (u,v)  €  E  has  an  associated  nonnegative  proba¬ 
bility  p(u,  v)  of  traversing  the  edge  (u.  v)  from  vertex  u  and  thus  producing  the 
corresponding  sound.  The  sum  of  the  probabilities  of  the  edges  leaving  any  vertex 
equals  1.  The  probability  of  a  path  is  defined  to  be  the  product  of  the  probabil¬ 
ities  of  its  edges.  We  can  view  the  probability  of  a  path  beginning  at  v0  as  the 
probability  that  a  “random  walk”  beginning  at  v0  will  follow  the  specified  path, 
where  we  randomly  choose  which  edge  to  take  leaving  a  vertex  u  according  to  the 
probabilities  of  the  available  edges  leaving  u. 

b.  Extend  your  answer  to  part  (a)  so  that  if  a  path  is  returned,  it  is  a  most  prob¬ 
able  path  starting  at  v0  and  having  label  s.  Analyze  the  running  time  of  your 
algorithm. 

15-8  Image  compression  by  seam  carving 

We  are  given  a  color  picture  consisting  of  an  m  x  n  array  A[\  . .  m,  1  . .  n]  of  pixels, 
where  each  pixel  specifies  a  triple  of  red,  green,  and  blue  (RGB)  intensities.  Sup¬ 
pose  that  we  wish  to  compress  this  picture  slightly.  Specifically,  we  wish  to  remove 
one  pixel  from  each  of  the  m  rows,  so  that  the  whole  picture  becomes  one  pixel 
narrower.  To  avoid  disturbing  visual  effects,  however,  we  require  that  the  pixels 
removed  in  two  adjacent  rows  be  in  the  same  or  adjacent  columns;  the  pixels  re¬ 
moved  form  a  “seam”  from  the  top  row  to  the  bottom  row  where  successive  pixels 
in  the  seam  are  adjacent  vertically  or  diagonally. 

a.  Show  that  the  number  of  such  possible  seams  grows  at  least  exponentially  in  m, 
assuming  that  n  >  1 . 

b.  Suppose  now  that  along  with  each  pixel  A[i,j],  we  have  calculated  a  real¬ 
valued  disruption  measure  d[i,j],  indicating  how  disruptive  it  would  be  to 
remove  pixel  A[i,j],  Intuitively,  the  lower  a  pixel’s  disruption  measure,  the 
more  similar  the  pixel  is  to  its  neighbors.  Suppose  further  that  we  define  the 
disruption  measure  of  a  seam  to  be  the  sum  of  the  disruption  measures  of  its 
pixels. 
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Give  an  algorithm  to  find  a  seam  with  the  lowest  disruption  measure.  How 
efficient  is  your  algorithm? 

15-9  Breaking  a  string 

A  certain  string-processing  language  allows  a  programmer  to  break  a  string  into 
two  pieces.  Because  this  operation  copies  the  string,  it  costs  n  time  units  to  break 
a  string  of  n  characters  into  two  pieces.  Suppose  a  programmer  wants  to  break 
a  string  into  many  pieces.  The  order  in  which  the  breaks  occur  can  affect  the 
total  amount  of  time  used.  For  example,  suppose  that  the  programmer  wants  to 
break  a  20-character  string  after  characters  2,  8,  and  10  (numbering  the  characters 
in  ascending  order  from  the  left-hand  end,  starting  from  1).  If  she  programs  the 
breaks  to  occur  in  left-to-right  order,  then  the  first  break  costs  20  time  units,  the 
second  break  costs  18  time  units  (breaking  the  string  from  characters  3  to  20  at 
character  8),  and  the  third  break  costs  12  time  units,  totaling  50  time  units.  If  she 
programs  the  breaks  to  occur  in  right-to-left  order,  however,  then  the  first  break 
costs  20  time  units,  the  second  break  costs  10  time  units,  and  the  third  break  costs 
8  time  units,  totaling  38  time  units.  In  yet  another  order,  she  could  break  first  at  8 
(costing  20),  then  break  the  left  piece  at  2  (costing  8),  and  finally  the  right  piece 
at  10  (costing  12),  for  a  total  cost  of  40. 

Design  an  algorithm  that,  given  the  numbers  of  characters  after  which  to  break, 
determines  a  least-cost  way  to  sequence  those  breaks.  More  formally,  given  a 
string  S  with  n  characters  and  an  array  L[1  . .  m]  containing  the  break  points,  com¬ 
pute  the  lowest  cost  for  a  sequence  of  breaks,  along  with  a  sequence  of  breaks  that 
achieves  this  cost. 

15-10  Planning  an  investment  strategy 

Your  knowledge  of  algorithms  helps  you  obtain  an  exciting  job  with  the  Acme 
Computer  Company,  along  with  a  $10,000  signing  bonus.  You  decide  to  invest 
this  money  with  the  goal  of  maximizing  your  return  at  the  end  of  10  years.  You 
decide  to  use  the  Amalgamated  Investment  Company  to  manage  your  investments. 
Amalgamated  Investments  requires  you  to  observe  the  following  rules.  It  offers  n 
different  investments,  numbered  1  through  n .  In  each  year  j ,  investment  i  provides 
a  return  rate  of  rtj .  In  other  words,  if  you  invest  d  dollars  in  investment  i  in  year  j , 
then  at  the  end  of  year  j ,  you  have  dr ;/-  dollars.  The  return  rates  are  guaranteed, 
that  is,  you  are  given  all  the  return  rates  for  the  next  10  years  for  each  investment. 
You  make  investment  decisions  only  once  per  year.  At  the  end  of  each  year,  you 
can  leave  the  money  made  in  the  previous  year  in  the  same  investments,  or  you 
can  shift  money  to  other  investments,  by  either  shifting  money  between  existing 
investments  or  moving  money  to  a  new  investement.  If  you  do  not  move  your 
money  between  two  consecutive  years,  you  pay  a  fee  of  f\  dollars,  whereas  if  you 
switch  your  money,  you  pay  a  fee  of  f2  dollars,  where  f2>  f\ ■ 
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a.  The  problem,  as  stated,  allows  you  to  invest  your  money  in  multiple  investments 
in  each  year.  Prove  that  there  exists  an  optimal  investment  strategy  that,  in 
each  year,  puts  all  the  money  into  a  single  investment.  (Recall  that  an  optimal 
investment  strategy  maximizes  the  amount  of  money  after  10  years  and  is  not 
concerned  with  any  other  objectives,  such  as  minimizing  risk.) 

b.  Prove  that  the  problem  of  planning  your  optimal  investment  strategy  exhibits 
optimal  substructure. 

c.  Design  an  algorithm  that  plans  your  optimal  investment  strategy.  What  is  the 
running  time  of  your  algorithm? 

d.  Suppose  that  Amalgamated  Investments  imposed  the  additional  restriction  that, 
at  any  point,  you  can  have  no  more  than  $15,000  in  any  one  investment.  Show 
that  the  problem  of  maximizing  your  income  at  the  end  of  10  years  no  longer 
exhibits  optimal  substructure. 

15-11  Inventory  planning 

The  Rinky  Dink  Company  makes  machines  that  resurface  ice  rinks.  The  demand 
for  such  products  varies  from  month  to  month,  and  so  the  company  needs  to  de¬ 
velop  a  strategy  to  plan  its  manufacturing  given  the  fluctuating,  but  predictable, 
demand.  The  company  wishes  to  design  a  plan  for  the  next  n  months.  For  each 
month  i ,  the  company  knows  the  demand  d, ,  that  is,  the  number  of  machines  that 
it  will  sell.  Let  D  =  i  di  t>e  the  total  demand  over  the  next  n  months.  The 
company  keeps  a  full-time  staff  who  provide  labor  to  manufacture  up  to  m  ma¬ 
chines  per  month.  If  the  company  needs  to  make  more  than  m  machines  in  a  given 
month,  it  can  hire  additional,  part-time  labor,  at  a  cost  that  works  out  to  c  dollars 
per  machine.  Furthermore,  if,  at  the  end  of  a  month,  the  company  is  holding  any 
unsold  machines,  it  must  pay  inventory  costs.  The  cost  for  holding  j  machines  is 

given  as  a  function  h(j)  for  j  =  1,2 . D,  where  h(j)  >  0  for  1  <  j  <  D  and 

h(j)  <  h(j  +  1)  for  1  <  j  <  D  —  1. 

Give  an  algorithm  that  calculates  a  plan  for  the  company  that  minimizes  its  costs 
while  fulfilling  all  the  demand.  The  running  time  should  be  polyomial  in  n  and  D . 

15-12  Signing  free-agent  baseball  players 

Suppose  that  you  are  the  general  manager  for  a  major-league  baseball  team.  During 
the  off-season,  you  need  to  sign  some  free-agent  players  for  your  team.  The  team 
owner  has  given  you  a  budget  of  $X  to  spend  on  free  agents.  You  are  allowed  to 
spend  less  than  $X  altogether,  but  the  owner  will  fire  you  if  you  spend  any  more 
than  $Y. 
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You  are  considering  N  different  positions,  and  for  each  position,  P  free-agent 
players  who  play  that  position  are  available.8  Because  you  do  not  want  to  overload 
your  roster  with  too  many  players  at  any  position,  for  each  position  you  may  sign 
at  most  one  free  agent  who  plays  that  position.  (If  you  do  not  sign  any  players  at  a 
particular'  position,  then  you  plan  to  stick  with  the  players  you  already  have  at  that 
position.) 

To  determine  how  valuable  a  player  is  going  to  be,  you  decide  to  use  a  sabermet- 
ric  statistic9  known  as  “VORP,”  or  “value  over  replacement  player.”  A  player  with 
a  higher  VORP  is  more  valuable  than  a  player  with  a  lower  VORP.  A  player  with  a 
higher  VORP  is  not  necessarily  more  expensive  to  sign  than  a  player  with  a  lower 
VORP,  because  factors  other  than  a  player’s  value  determine  how  much  it  costs  to 
sign  him. 

For  each  available  free-agent  player,  you  have  three  pieces  of  information: 

•  the  player’s  position, 

•  the  amount  of  money  it  will  cost  to  sign  the  player,  and 

•  the  player’s  VORP. 

Devise  an  algorithm  that  maximizes  the  total  VORP  of  the  players  you  sign  while 
spending  no  more  than  $Y  altogether.  You  may  assume  that  each  player  signs  for  a 
multiple  of  $100,000.  Your  algorithm  should  output  the  total  VORP  of  the  players 
you  sign,  the  total  amount  of  money  you  spend,  and  a  list  of  which  players  you 
sign.  Analyze  the  running  time  and  space  requirement  of  your  algorithm. 


Chapter  notes 

R.  Bellman  began  the  systematic  study  of  dynamic  programming  in  1955.  The 
word  “programming,”  both  here  and  in  linear  programming,  refers  to  using  a  tab¬ 
ular  solution  method.  Although  optimization  techniques  incorporating  elements  of 
dynamic  programming  were  known  earlier,  Bellman  provided  the  area  with  a  solid 
mathematical  basis  [37]. 


8Although  there  are  nine  positions  on  a  baseball  team,  N  is  not  necesarily  equal  to  9  because  some 
general  managers  have  particular  ways  of  thinking  about  positions.  For  example,  a  general  manager 
might  consider  right  handed  pitchers  and  left  handed  pitchers  to  be  separate  “positions,”  as  well  as 
starting  pitchers,  long  relief  pitchers  (relief  pitchers  who  can  pitch  several  innings),  and  short  relief 
pitchers  (relief  pitchers  who  normally  pitch  at  most  only  one  inning). 

9 Sabermetrics  is  the  application  of  statistical  analysis  to  baseball  records.  It  provides  several  ways 
to  compare  the  relative  values  of  individual  players. 
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Galil  and  Park  [125]  classify  dynamic-programming  algorithms  according  to  the 
size  of  the  table  and  the  number  of  other  table  entries  each  entry  depends  on.  They 
call  a  dynamic -programming  algorithm  tD/eD  if  its  table  size  is  0(nr)  and  each 
entry  depends  on  0(ne)  other  entries.  For  example,  the  matrix-chain  multiplication 
algorithm  in  Section  15.2  would  be  2D /ID,  and  the  longest-common-subsequence 
algorithm  in  Section  15.4  would  be  2D/0D. 

Hu  and  Shing  [182,  183]  give  an  0(n  lgn)-time  algorithm  for  the  matrix-chain 
multiplication  problem. 

The  0(mn)-time  algorithm  for  the  longest-common-subsequence  problem  ap¬ 
pears  to  be  a  folk  algorithm.  Knuth  [70]  posed  the  question  of  whether  subquadratic 
algorithms  for  the  LCS  problem  exist.  Masek  and  Paterson  [244]  answered  this 
question  in  the  affirmative  by  giving  an  algorithm  that  runs  in  0(mn/  lgn)  time, 
where  n  <  m  and  the  sequences  are  drawn  from  a  set  of  bounded  size.  For  the 
special  case  in  which  no  element  appears  more  than  once  in  an  input  sequence, 
Szymanski  [326]  shows  how  to  solve  the  problem  in  0((n  +  m)  lg (n  +  in ) )  time. 
Many  of  these  results  extend  to  the  problem  of  computing  string  edit  distances 
(Problem  15-5). 

An  early  paper  on  variable-length  binary  encodings  by  Gilbert  and  Moore  [133] 
had  applications  to  constructing  optimal  binary  search  trees  for  the  case  in  which  all 
probabilities  p,  are  0;  this  paper  contains  an  0(n3)-time  algorithm.  Aho,  Hopcroft, 
and  Ullman  [5]  present  the  algorithm  from  Section  15.5.  Exercise  15.5-4  is  due  to 
Knuth  [212].  Hu  and  Tucker  [184]  devised  an  algorithm  for  the  case  in  which  all 
probabilities  p,  are  0  that  uses  0(nz)  time  and  O(n)  space;  subsequently,  Knuth 
[211]  reduced  the  time  to  0(n  lg  n). 

Problem  15-8  is  due  to  Avidan  and  Shamir  [27],  who  have  posted  on  the  Web  a 
wonderful  video  illustrating  this  image-compression  technique. 
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Algorithms  for  optimization  problems  typically  go  through  a  sequence  of  steps, 
with  a  set  of  choices  at  each  step.  For  many  optimization  problems,  using  dynamic 
programming  to  determine  the  best  choices  is  overkill;  simpler,  more  efficient  al¬ 
gorithms  will  do.  A  greedy  algorithm  always  makes  the  choice  that  looks  best  at 
the  moment.  That  is,  it  makes  a  locally  optimal  choice  in  the  hope  that  this  choice 
will  lead  to  a  globally  optimal  solution.  This  chapter  explores  optimization  prob¬ 
lems  for  which  greedy  algorithms  provide  optimal  solutions.  Before  reading  this 
chapter,  you  should  read  about  dynamic  programming  in  Chapter  15,  particularly 
Section  15.3. 

Greedy  algorithms  do  not  always  yield  optimal  solutions,  but  for  many  problems 
they  do.  We  shall  first  examine,  in  Section  16.1,  a  simple  but  nontrivial  problem, 
the  activity-selection  problem,  for  which  a  greedy  algorithm  efficiently  computes 
an  optimal  solution.  We  shall  arrive  at  the  greedy  algorithm  by  first  consider¬ 
ing  a  dynamic -programming  approach  and  then  showing  that  we  can  always  make 
greedy  choices  to  arrive  at  an  optimal  solution.  Section  16.2  reviews  the  basic 
elements  of  the  greedy  approach,  giving  a  direct  approach  for  proving  greedy  al¬ 
gorithms  correct.  Section  16.3  presents  an  important  application  of  greedy  tech¬ 
niques:  designing  data-compression  (Huffman)  codes.  In  Section  16.4,  we  inves¬ 
tigate  some  of  the  theory  underlying  combinatorial  structures  called  “matroids,” 
for  which  a  greedy  algorithm  always  produces  an  optimal  solution.  Finally,  Sec¬ 
tion  16.5  applies  matroids  to  solve  a  problem  of  scheduling  unit-time  tasks  with 
deadlines  and  penalties. 

The  greedy  method  is  quite  powerful  and  works  well  for  a  wide  range  of  prob¬ 
lems.  Later  chapters  will  present  many  algorithms  that  we  can  view  as  applica¬ 
tions  of  the  greedy  method,  including  minimum-spanning-tree  algorithms  (Chap¬ 
ter  23),  Dijkstra’s  algorithm  for  shortest  paths  from  a  single  source  (Chapter  24), 
and  Chvatal’s  greedy  set-covering  heuristic  (Chapter  35).  Minimum-spanning-tree 
algorithms  furnish  a  classic  example  of  the  greedy  method.  Although  you  can  read 
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this  chapter  and  Chapter  23  independently  of  each  other,  you  might  find  it  useful 
to  read  them  together. 


16.1  An  activity-selection  problem 

Our  first  example  is  the  problem  of  scheduling  several  competing  activities  that  re¬ 
quire  exclusive  use  of  a  common  resource,  with  a  goal  of  selecting  a  maximum-size 
set  of  mutually  compatible  activities.  Suppose  we  have  a  set  S'  =  {a , ,  a2, . . . ,  a„) 
of  n  proposed  activities  that  wish  to  use  a  resource,  such  as  a  lecture  hall,  which 
can  serve  only  one  activity  at  a  time.  Each  activity  a,  has  a  start  time  .v,  and  a  finish 
time  fi,  where  0  <  .s-,  <  fi  <  oo.  If  selected,  activity  a,  takes  place  during  the 
half-open  time  interval  [.q ,  f).  Activities  a,  and  a,  are  compatible  if  the  intervals 
[.S’, ,  fi)  and  [ .s'y ,  fj)  do  not  overlap.  That  is,  a,  and  aj  are  compatible  if  *  >  fl 
or  Sj  >  f.  In  the  activity -selection  problem,  we  wish  to  select  a  maximum-size 
subset  of  mutually  compatible  activities.  We  assume  that  the  activities  are  sorted 
in  monotonically  increasing  order  of  finish  time: 

/l</2</3<-  </»-!</»•  (16-1) 

(We  shall  see  later  the  advantage  that  this  assumption  provides.)  For  example, 
consider  the  following  set  S  of  activities: 
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1 
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3 
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7 
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10 

11 

Si 
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6 
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8 

2 

12 

fi 
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7 
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9 

10 

11 

12 

14 

16 

For  this  example,  the  subset  {a3 ,  a9 ,  a  1 1 }  consists  of  mutually  compatible  activities. 
It  is  not  a  maximum  subset,  however,  since  the  subset  {a\,a4,  a, ,  [  is  larger.  In 
fact,  is  a  largest  subset  of  mutually  compatible  activities;  another 

largest  subset  is  {a2,a4,ag,an}. 

We  shall  solve  this  problem  in  several  steps.  We  staid  by  thinking  about  a 
dynamic-programming  solution,  in  which  we  consider  several  choices  when  deter¬ 
mining  which  subproblems  to  use  in  an  optimal  solution.  We  shall  then  observe  that 
we  need  to  consider  only  one  choice— the  greedy  choice— and  that  when  we  make 
the  greedy  choice,  only  one  subproblem  remains.  Based  on  these  observations,  we 
shall  develop  a  recursive  greedy  algorithm  to  solve  the  activity-scheduling  prob¬ 
lem.  We  shall  complete  the  process  of  developing  a  greedy  solution  by  converting 
the  recursive  algorithm  to  an  iterative  one.  Although  the  steps  we  shall  go  through 
in  this  section  are  slightly  more  involved  than  is  typical  when  developing  a  greedy 
algorithm,  they  illustrate  the  relationship  between  greedy  algorithms  and  dynamic 
programming. 
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The  optimal  substructure  of  the  activity-selection  problem 

We  can  easily  verify  that  the  activity-selection  problem  exhibits  optimal  substruc¬ 
ture.  Let  us  denote  by  Sy  the  set  of  activities  that  start  after  activity  a,  finishes  and 
that  finish  before  activity  ctj  starts.  Suppose  that  we  wish  to  find  a  maximum  set  of 
mutually  compatible  activities  in  Sy ,  and  suppose  further  that  such  a  maximum  set 
is  A ij ,  which  includes  some  activity  a^.  By  including  ak  in  an  optimal  solution,  we 
are  left  with  two  subproblems:  finding  mutually  compatible  activities  in  the  set  Stk 
(activities  that  start  after  activity  a,  finishes  and  that  finish  before  activity  ak  starts) 
and  finding  mutually  compatible  activities  in  the  set  Skj  (activities  that  start  after 
activity  ak  finishes  and  that  finish  before  activity  aj  starts).  Let  =  Atj  D  Stk 
and  Akj  =  A ,/  D  Skj,  so  that  A,k  contains  the  activities  in  Atj  that  finish  before  ak 
starts  and  Akj  contains  the  activities  in  Atj  that  start  after  ak  finishes.  Thus,  we 
have  Ajj  =  Atk  U  {ak}  U  Akj,  and  so  the  maximum-size  set  A ,/  of  mutually  com¬ 
patible  activities  in  S,j  consists  of  |.4(/ 1  =  |  A^ |  +  \Akj  \  +  1  activities. 

The  usual  cut-and-paste  argument  shows  that  the  optimal  solution  Atj  must  also 
include  optimal  solutions  to  the  two  subproblems  for  Sjk  and  Skj-  If  we  could 
find  a  set  A'k  -  of  mutually  compatible  activities  in  Skj  where  |  A  L.  \  >  \  A  kj  \ ,  then 
we  could  use  A'kj,  rather  than  Akj,  in  a  solution  to  the  subproblem  for  .S',-/ .  We 
would  have  constructed  a  set  of  \A,k\  +  \A'k,\  +  1  >  | ^4/* |  +  \Akj\  +  1  =  \Atj\ 
mutually  compatible  activities,  which  contradicts  the  assumption  that  Atj  is  an 
optimal  solution.  A  symmetric  argument  applies  to  the  activities  in  5,-*. 

This  way  of  characterizing  optimal  substructure  suggests  that  we  might  solve 
the  activity-selection  problem  by  dynamic  programming.  If  we  denote  the  size  of 
an  optimal  solution  for  the  set  Sy  by  c[i,j\,  then  we  would  have  the  recurrence 

c[i,j]  =  c[i,k\  +  c[k,j ]  +  1  . 

Of  course,  if  we  did  not  know  that  an  optimal  solution  for  the  set  Sy  includes 
activity  ak,  we  would  have  to  examine  all  activities  in  5,v-  to  find  which  one  to 
choose,  so  that 

(  0  if  Sij  =  0  , 

c[kj]  —  j  max  £]  _|_  c[k,j]  +  1}  if  Sij  ^  0  .  (16-2) 

(  akeSij 

We  could  then  develop  a  recursive  algorithm  and  memoize  it,  or  we  could  work 
bottom-up  and  fill  in  table  entries  as  we  go  along.  But  we  would  be  overlooking 
another  important  characteristic  of  the  activity-selection  problem  that  we  can  use 
to  great  advantage. 
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Making  the  greedy  choice 

What  if  we  could  choose  an  activity  to  add  to  our  optimal  solution  without  having 
to  first  solve  all  the  subproblems?  That  could  save  us  from  having  to  consider  all 
the  choices  inherent  in  recurrence  (16.2).  In  fact,  for  the  activity-selection  problem, 
we  need  consider  only  one  choice:  the  greedy  choice. 

What  do  we  mean  by  the  greedy  choice  for  the  activity-selection  problem?  Intu¬ 
ition  suggests  that  we  should  choose  an  activity  that  leaves  the  resource  available 
for  as  many  other  activities  as  possible.  Now,  of  the  activities  we  end  up  choos¬ 
ing,  one  of  them  must  be  the  first  one  to  finish.  Our  intuition  tells  us,  therefore, 
to  choose  the  activity  in  S  with  the  earliest  finish  time,  since  that  would  leave  the 
resource  available  for  as  many  of  the  activities  that  follow  it  as  possible.  (If  more 
than  one  activity  in  S  has  the  earliest  finish  time,  then  we  can  choose  any  such 
activity.)  In  other  words,  since  the  activities  are  sorted  in  monotonically  increasing 
order  by  finish  time,  the  greedy  choice  is  activity  a,\.  Choosing  the  first  activity 
to  finish  is  not  the  only  way  to  think  of  making  a  greedy  choice  for  this  problem; 
Exercise  16.1-3  asks  you  to  explore  other  possibilities. 

If  we  make  the  greedy  choice,  we  have  only  one  remaining  subproblem  to  solve: 
finding  activities  that  start  after  ci\  finishes.  Why  don’t  we  have  to  consider  ac¬ 
tivities  that  finish  before  a\  starts?  We  have  that  S\  <  fi,  and  /,  is  the  earliest 
finish  time  of  any  activity,  and  therefore  no  activity  can  have  a  finish  time  less  than 
or  equal  to  Si.  Thus,  all  activities  that  are  compatible  with  activity  a ,  must  staid 
after  a.\  finishes. 

Furthermore,  we  have  already  established  that  the  activity-selection  problem  ex¬ 
hibits  optimal  substructure.  LetSyfc  =  !«,■  €  S  :  St  >  fu !  be  the  set  of  activities  that 
start  after  activity  a k  finishes.  If  we  make  the  greedy  choice  of  activity  a , ,  then  .S', 
remains  as  the  only  subproblem  to  solve.1  Optimal  substructure  tells  us  that  if  a\ 
is  in  the  optimal  solution,  then  an  optimal  solution  to  the  original  problem  consists 
of  activity  a  t  and  all  the  activities  in  an  optimal  solution  to  the  subproblem  5, . 

One  big  question  remains:  is  our  intuition  correct?  Is  the  greedy  choice— in 
which  we  choose  the  first  activity  to  finish— always  part  of  some  optimal  solution? 
The  following  theorem  shows  that  it  is. 


1We  sometimes  refer  to  the  sets  S £  as  subproblems  rather  than  as  just  sets  of  activities.  It  will  always 
be  clear  from  the  context  whether  we  are  referring  to  as  a  set  of  activities  or  as  a  subproblem 
whose  input  is  that  set. 
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Theorem  16.1 

Consider  any  nonempty  subproblem  Sk,  and  let  am  be  an  activity  in  Sk  with  the 
earliest  finish  time.  Then  am  is  included  in  some  maximum-size  subset  of  mutually 
compatible  activities  of  Sk- 

Proof  Let  Ak  be  a  maximum-size  subset  of  mutually  compatible  activities  in  Sk, 
and  let  cij  be  the  activity  in  Ak  with  the  earliest  finish  time.  If  aj  =  am,  we  are 
done,  since  we  have  shown  that  am  is  in  some  maximum-size  subset  of  mutually 
compatible  activities  of  Sk-  If  <3,  f  am,  let  the  set  A'k  =  Ak  —  {dj}  U  \am }  be  Ak 
but  substituting  am  for  cij .  The  activities  in  A'k  are  disjoint,  which  follows  because 
the  activities  in  Ak  are  disjoint,  aj  is  the  first  activity  in  Ak  to  finish,  and  fm  <  fj. 
Since  \A'k\  =  \Ak\,  we  conclude  that  A'k  is  a  maximum-size  subset  of  mutually 
compatible  activities  of  Sk,  and  it  includes  am.  m 

Thus,  we  see  that  although  we  might  be  able  to  solve  the  activity-selection  prob¬ 
lem  with  dynamic  programming,  we  don’t  need  to.  (Besides,  we  have  not  yet 
examined  whether  the  activity-selection  problem  even  has  overlapping  subprob¬ 
lems.)  Instead,  we  can  repeatedly  choose  the  activity  that  finishes  first,  keep  only 
the  activities  compatible  with  this  activity,  and  repeat  until  no  activities  remain. 
Moreover,  because  we  always  choose  the  activity  with  the  earliest  finish  time,  the 
finish  times  of  the  activities  we  choose  must  strictly  increase.  We  can  consider 
each  activity  just  once  overall,  in  monotonically  increasing  order  of  finish  times. 

An  algorithm  to  solve  the  activity-selection  problem  does  not  need  to  work 
bottom-up,  like  a  table-based  dynamic-programming  algorithm.  Instead,  it  can 
work  top-down,  choosing  an  activity  to  put  into  the  optimal  solution  and  then  solv¬ 
ing  the  subproblem  of  choosing  activities  from  those  that  are  compatible  with  those 
already  chosen.  Greedy  algorithms  typically  have  this  top-down  design:  make  a 
choice  and  then  solve  a  subproblem,  rather  than  the  bottom-up  technique  of  solving 
subproblems  before  making  a  choice. 

A  recursive  greedy  algorithm 

Now  that  we  have  seen  how  to  bypass  the  dynamic-programming  approach  and  in¬ 
stead  use  a  top-down,  greedy  algorithm,  we  can  write  a  straightforward,  recursive 
procedure  to  solve  the  activity-selection  problem.  The  procedure  RECURSIVE- 
Activity-Selector  takes  the  start  and  finish  times  of  the  activities,  represented 
as  arrays  s  and  /,2  the  index  k  that  defines  the  subproblem  Sk  it  is  to  solve,  and 


2Because  the  pseudocode  takes  s  and  /  as  arrays,  it  indexes  into  them  with  square  brackets  rather 
than  subscripts. 
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the  size  n  of  the  original  problem.  It  returns  a  maximum-size  set  of  mutually  com¬ 
patible  activities  in  Sk .  We  assume  that  the  n  input  activities  are  already  ordered 
by  monotonically  increasing  finish  time,  according  to  equation  (16.1).  If  not,  we 
can  sort  them  into  this  order  in  0(n  lgn)  time,  breaking  ties  arbitrarily.  In  order 
to  start,  we  add  the  fictitious  activity  a0  with  f0  =  0,  so  that  subproblem  S0  is 
the  entire  set  of  activities  S.  The  initial  call,  which  solves  the  entire  problem,  is 
Recursive-Activity-Selector(5,  /,  0,/O- 

Recursive-Activity-Selector  (s,  fk,n ) 

1  m  =  k  +  1 

2  while  m  <  n  and  s[m\  <  f[k]  II  find  the  first  activity  in  Sp  to  finish 

3  m  =  m  +  1 

4  if  m  <  n 

5  return  {am}  U  Recursive- Activity-Selector (s,  f  m,n ) 

6  else  return  0 

Figure  16.1  shows  the  operation  of  the  algorithm.  In  a  given  recursive  call 
Recursive-Activity-Selector(5,  f,k,n),  the  while  loop  of  lines  2-3  looks 
for  the  first  activity  in  Sk  to  finish.  The  loop  examines  ak+\ .  cik+i,  ■  ■  ■ .  an,  un¬ 
til  it  finds  the  first  activity  am  that  is  compatible  with  ap\  such  an  activity  has 
sm  >  fk ■  If  the  loop  terminates  because  it  finds  such  an  activity,  line  5  returns 
the  union  of  {am}  and  the  maximum-size  subset  of  Sm  returned  by  the  recursive 
call  Recursive- Activity-Selector^,  f.mjr).  Alternatively,  the  loop  may 
terminate  because  m  >  n,  in  which  case  we  have  examined  all  activities  in  Sk 
without  finding  one  that  is  compatible  with  cik-  In  this  case,  Sk  =  0,  and  so  the 
procedure  returns  0  in  line  6. 

Assuming  that  the  activities  have  already  been  sorted  by  finish  times,  the  running 
time  of  the  call  Recursive-Activity-Selector(5,  /,  0,  n)  is  0(n),  which  we 
can  see  as  follows.  Over  all  recursive  calls,  each  activity  is  examined  exactly  once 
in  the  while  loop  test  of  line  2.  In  particular,  activity  a,-  is  examined  in  the  last  call 
made  in  which  k  <  i. 

An  iterative  greedy  algorithm 

We  easily  can  convert  our  recursive  procedure  to  an  iterative  one.  The  procedure 
Recursive-Activity-Selector  is  almost  “tail  recursive”  (see  Problem  7-4): 
it  ends  with  a  recursive  call  to  itself  followed  by  a  union  operation.  It  is  usually  a 
straightforward  task  to  transform  a  tail-recursive  procedure  to  an  iterative  form;  in 
fact,  some  compilers  for  certain  programming  languages  perform  this  task  automat¬ 
ically.  As  written,  Recursive-Activity-Selector  works  for  subproblems  Sk, 
i.e.,  subproblems  that  consist  of  the  last  activities  to  finish. 


420 


Chapter  16  Greedy  Algorithms 


*  *k  fk 


0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16 


time 


Figure  16.1  The  operation  of  RECURSIVE  ACTIVITY  SELECTOR  on  the  1 1  activities  given  ear 
Her.  Activities  considered  in  each  recursive  call  appear  between  horizontal  lines.  The  fictitious 
activity  ao  finishes  at  time  0,  and  the  initial  call  RECURSIVE  ACTIVITY  SELECTOR^,  /,  0, 1 1),  se 
lects  activity  a  i.  In  each  recursive  call,  the  activities  that  have  already  been  selected  are  shaded, 
and  the  activity  shown  in  white  is  being  considered.  If  the  starting  time  of  an  activity  occurs  before 
the  finish  time  of  the  most  recently  added  activity  (the  arrow  between  them  points  left),  it  is  re 
jected.  Otherwise  (the  arrow  points  directly  up  or  to  the  right),  it  is  selected.  The  last  recursive  call, 
RECURSIVE  Activity  Selector^,  /,  1 1, 1 1),  returns  0.  The  resulting  set  of  selected  activities  is 
{ai,a4,a8.ail}- 
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The  procedure  Greedy- Activity-Selector  is  an  iterative  version  of  the  pro¬ 
cedure  Recursive-Activity-Selector.  It  also  assumes  that  the  input  activi¬ 
ties  are  ordered  by  monotonically  increasing  finish  time.  It  collects  selected  activ¬ 
ities  into  a  set  A  and  returns  this  set  when  it  is  done. 

Greedy- Activity-Selector  (s,  f  ) 

1  n  =  s.  length 

2  A  =  {«i} 

3  k  =  1 

4  for  m  =  2  to  n 

5  if  s[m\  >  f[k\ 

6  A  =  A  U  {amj 

7  k  =  m 

8  return  A 

The  procedure  works  as  follows.  The  variable  k  indexes  the  most  recent  addition 
to  A,  corresponding  to  the  activity  ak  in  the  recursive  version.  Since  we  consider 
the  activities  in  order  of  monotonically  increasing  finish  time,  fk  is  always  the 
maximum  finish  time  of  any  activity  in  A.  That  is, 

fk  =  max  { f  :  at  e  A}  .  (16.3) 

Lines  2-3  select  activity  a  i ,  initialize  A  to  contain  just  this  activity,  and  initialize  k 
to  index  this  activity.  The  for  loop  of  lines  4-7  finds  the  earliest  activity  in  Sk  to 
finish.  The  loop  considers  each  activity  am  in  turn  and  adds  am  to  A  if  it  is  compat¬ 
ible  with  all  previously  selected  activities;  such  an  activity  is  the  earliest  in  Sk  to 
finish.  To  see  whether  activity  am  is  compatible  with  every  activity  currently  in  A, 
it  suffices  by  equation  (16.3)  to  check  (in  line  5)  that  its  stall  time  sm  is  not  earlier 
than  the  finish  time  fk  of  the  activity  most  recently  added  to  A.  If  activity  am  is 
compatible,  then  lines  6-7  add  activity  am  to  A  and  set  k  to  m.  The  set  A  returned 
by  the  call  Greedy- Activity-Selector (s,  f)  is  precisely  the  set  returned  by 
the  call  Recursive-Activity-Selector(s,  f,0,n). 

Like  the  recursive  version,  Greedy- Activity- Selector  schedules  a  set  of  n 
activities  in  ©(«)  time,  assuming  that  the  activities  were  already  sorted  initially  by 
their  finish  times. 

Exercises 


16.1-1 

Give  a  dynamic -programming  algorithm  for  the  activity-selection  problem,  based 
on  recurrence  (16.2).  Have  your  algorithm  compute  the  sizes  c[i,j ]  as  defined 
above  and  also  produce  the  maximum-size  subset  of  mutually  compatible  activities. 
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Assume  that  the  inputs  have  been  sorted  as  in  equation  (16. 1).  Compare  the  running 
time  of  your  solution  to  the  running  time  of  Greedy- Activity- Selector. 


16.1-2 

Suppose  that  instead  of  always  selecting  the  first  activity  to  finish,  we  instead  select 
the  last  activity  to  start  that  is  compatible  with  all  previously  selected  activities.  De¬ 
scribe  how  this  approach  is  a  greedy  algorithm,  and  prove  that  it  yields  an  optimal 
solution. 


16.1-3 

Not  just  any  greedy  approach  to  the  activity-selection  problem  produces  a  max¬ 
imum-size  set  of  mutually  compatible  activities.  Give  an  example  to  show  that 
the  approach  of  selecting  the  activity  of  least  duration  from  among  those  that  are 
compatible  with  previously  selected  activities  does  not  work.  Do  the  same  for 
the  approaches  of  always  selecting  the  compatible  activity  that  overlaps  the  fewest 
other  remaining  activities  and  always  selecting  the  compatible  remaining  activity 
with  the  earliest  start  time. 


16.1-4 

Suppose  that  we  have  a  set  of  activities  to  schedule  among  a  large  number  of  lecture 
halls,  where  any  activity  can  take  place  in  any  lecture  hall.  We  wish  to  schedule 
all  the  activities  using  as  few  lecture  halls  as  possible.  Give  an  efficient  greedy 
algorithm  to  determine  which  activity  should  use  which  lecture  hall. 

(This  problem  is  also  known  as  the  interval-graph  coloring  problem.  We  can 
create  an  interval  graph  whose  vertices  are  the  given  activities  and  whose  edges 
connect  incompatible  activities.  The  smallest  number  of  colors  required  to  color 
every  vertex  so  that  no  two  adjacent  vertices  have  the  same  color  corresponds  to 
finding  the  fewest  lecture  halls  needed  to  schedule  all  of  the  given  activities.) 


16.1-5 

Consider  a  modification  to  the  activity-selection  problem  in  which  each  activity  a, 
has,  in  addition  to  a  start  and  finish  time,  a  value  v,.  The  objective  is  no  longer 
to  maximize  the  number  of  activities  scheduled,  but  instead  to  maximize  the  total 
value  of  the  activities  scheduled.  That  is,  we  wish  to  choose  a  set  A  of  compatible 
activities  such  that  J ~2Uk  e  a  Vk 's  maximized.  Give  a  polynomial-time  algorithm  for 
this  problem. 
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16.2  Elements  of  the  greedy  strategy 

A  greedy  algorithm  obtains  an  optimal  solution  to  a  problem  by  making  a  sequence 
of  choices.  At  each  decision  point,  the  algorithm  makes  choice  that  seems  best  at 
the  moment.  This  heuristic  strategy  does  not  always  produce  an  optimal  solution, 
but  as  we  saw  in  the  activity-selection  problem,  sometimes  it  does.  This  section 
discusses  some  of  the  general  properties  of  greedy  methods. 

The  process  that  we  followed  in  Section  16.1  to  develop  a  greedy  algorithm  was 
a  bit  more  involved  than  is  typical.  We  went  through  the  following  steps: 

1 .  Determine  the  optimal  substructure  of  the  problem. 

2.  Develop  a  recursive  solution.  (For  the  activity-selection  problem,  we  formu¬ 
lated  recurrence  (16.2),  but  we  bypassed  developing  a  recursive  algorithm  based 
on  this  recurrence.) 

3.  Show  that  if  we  make  the  greedy  choice,  then  only  one  subproblem  remains. 

4.  Prove  that  it  is  always  safe  to  make  the  greedy  choice.  (Steps  3  and  4  can  occur 
in  either  order.) 

5.  Develop  a  recursive  algorithm  that  implements  the  greedy  strategy. 

6.  Convert  the  recursive  algorithm  to  an  iterative  algorithm. 

In  going  through  these  steps,  we  saw  in  great  detail  the  dynamic-programming  un¬ 
derpinnings  of  a  greedy  algorithm.  For  example,  in  the  activity-selection  problem, 
we  first  defined  the  subproblems  Sy ,  where  both  i  and  j  varied.  We  then  found 
that  if  we  always  made  the  greedy  choice,  we  could  restrict  the  subproblems  to  be 
of  the  form  Sg. 

Alternatively,  we  could  have  fashioned  our  optimal  substructure  with  a  greedy 
choice  in  mind,  so  that  the  choice  leaves  just  one  subproblem  to  solve.  In  the 
activity-selection  problem,  we  could  have  started  by  dropping  the  second  subscript 
and  defining  subproblems  of  the  form  Sg.  Then,  we  could  have  proven  that  a  greedy 
choice  (the  first  activity  am  to  finish  in  Sg),  combined  with  an  optimal  solution  to 
the  remaining  set  Sm  of  compatible  activities,  yields  an  optimal  solution  to  Sg. 
More  generally,  we  design  greedy  algorithms  according  to  the  following  sequence 
of  steps: 

1.  Cast  the  optimization  problem  as  one  in  which  we  make  a  choice  and  are  left 
with  one  subproblem  to  solve. 

2.  Prove  that  there  is  always  an  optimal  solution  to  the  original  problem  that  makes 
the  greedy  choice,  so  that  the  greedy  choice  is  always  safe. 
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3.  Demonstrate  optimal  substructure  by  showing  that,  having  made  the  greedy 
choice,  what  remains  is  a  subproblem  with  the  property  that  if  we  combine  an 
optimal  solution  to  the  subproblem  with  the  greedy  choice  we  have  made,  we 
arrive  at  an  optimal  solution  to  the  original  problem. 

We  shall  use  this  more  direct  process  in  later  sections  of  this  chapter.  Neverthe¬ 
less,  beneath  every  greedy  algorithm,  there  is  almost  always  a  more  cumbersome 
dynamic -programming  solution. 

How  can  we  tell  whether  a  greedy  algorithm  will  solve  a  particular  optimization 
problem?  No  way  works  all  the  time,  but  the  greedy-choice  property  and  optimal 
substructure  are  the  two  key  ingredients.  If  we  can  demonstrate  that  the  problem 
has  these  properties,  then  we  are  well  on  the  way  to  developing  a  greedy  algorithm 
for  it. 

Greedy-choice  property 

The  first  key  ingredient  is  the  greedy-choice  property,  we  can  assemble  a  globally 
optimal  solution  by  making  locally  optimal  (greedy)  choices.  In  other  words,  when 
we  are  considering  which  choice  to  make,  we  make  the  choice  that  looks  best  in 
the  current  problem,  without  considering  results  from  subproblems. 

Here  is  where  greedy  algorithms  differ  from  dynamic  programming.  In  dynamic 
programming,  we  make  a  choice  at  each  step,  but  the  choice  usually  depends  on  the 
solutions  to  subproblems.  Consequently,  we  typically  solve  dynamic-programming 
problems  in  a  bottom-up  manner,  progressing  from  smaller  subproblems  to  larger 
subproblems.  (Alternatively,  we  can  solve  them  top  down,  but  memoizing.  Of 
course,  even  though  the  code  works  top  down,  we  still  must  solve  the  subprob¬ 
lems  before  making  a  choice.)  In  a  greedy  algorithm,  we  make  whatever  choice 
seems  best  at  the  moment  and  then  solve  the  subproblem  that  remains.  The  choice 
made  by  a  greedy  algorithm  may  depend  on  choices  so  far,  but  it  cannot  depend  on 
any  future  choices  or  on  the  solutions  to  subproblems.  Thus,  unlike  dynamic  pro¬ 
gramming,  which  solves  the  subproblems  before  making  the  first  choice,  a  greedy 
algorithm  makes  its  first  choice  before  solving  any  subproblems.  A  dynamic¬ 
programming  algorithm  proceeds  bottom  up,  whereas  a  greedy  strategy  usually 
progresses  in  a  top-down  fashion,  making  one  greedy  choice  after  another,  reduc¬ 
ing  each  given  problem  instance  to  a  smaller  one. 

Of  course,  we  must  prove  that  a  greedy  choice  at  each  step  yields  a  globally 
optimal  solution.  Typically,  as  in  the  case  of  Theorem  16.1,  the  proof  examines 
a  globally  optimal  solution  to  some  subproblem.  It  then  shows  how  to  modify 
the  solution  to  substitute  the  greedy  choice  for  some  other  choice,  resulting  in  one 
similar,  but  smaller,  subproblem. 

We  can  usually  make  the  greedy  choice  more  efficiently  than  when  we  have  to 
consider  a  wider  set  of  choices.  For  example,  in  the  activity-selection  problem,  as- 
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suming  that  we  had  already  sorted  the  activities  in  monotonically  increasing  order 
of  finish  times,  we  needed  to  examine  each  activity  just  once.  By  preprocessing  the 
input  or  by  using  an  appropriate  data  structure  (often  a  priority  queue),  we  often 
can  make  greedy  choices  quickly,  thus  yielding  an  efficient  algorithm. 

Optimal  substructure 

A  problem  exhibits  optimal  substructure  if  an  optimal  solution  to  the  problem 
contains  within  it  optimal  solutions  to  subproblems.  This  property  is  a  key  in¬ 
gredient  of  assessing  the  applicability  of  dynamic  programming  as  well  as  greedy 
algorithms.  As  an  example  of  optimal  substructure,  recall  how  we  demonstrated  in 
Section  16.1  that  if  an  optimal  solution  to  subproblem  5jy  includes  an  activity  ag, 
then  it  must  also  contain  optimal  solutions  to  the  subproblems  Sa g  and  Sgj.  Given 
this  optimal  substructure,  we  argued  that  if  we  knew  which  activity  to  use  as  ag,  we 
could  construct  an  optimal  solution  to  Sjj  by  selecting  ag  along  with  all  activities 
in  optimal  solutions  to  the  subproblems  S,g  and  Sgj.  Based  on  this  observation  of 
optimal  substructure,  we  were  able  to  devise  the  recurrence  (16.2)  that  described 
the  value  of  an  optimal  solution. 

We  usually  use  a  more  direct  approach  regarding  optimal  substructure  when 
applying  it  to  greedy  algorithms.  As  mentioned  above,  we  have  the  luxury  of 
assuming  that  we  arrived  at  a  subproblem  by  having  made  the  greedy  choice  in 
the  original  problem.  All  we  really  need  to  do  is  argue  that  an  optimal  solution  to 
the  subproblem,  combined  with  the  greedy  choice  already  made,  yields  an  optimal 
solution  to  the  original  problem.  This  scheme  implicitly  uses  induction  on  the 
subproblems  to  prove  that  making  the  greedy  choice  at  every  step  produces  an 
optimal  solution. 

Greedy  versus  dynamic  programming 

Because  both  the  greedy  and  dynamic-programming  strategies  exploit  optimal  sub¬ 
structure,  you  might  be  tempted  to  generate  a  dynamic-programming  solution  to  a 
problem  when  a  greedy  solution  suffices  or,  conversely,  you  might  mistakenly  think 
that  a  greedy  solution  works  when  in  fact  a  dynamic-programming  solution  is  re¬ 
quired.  To  illustrate  the  subtleties  between  the  two  techniques,  let  us  investigate 
two  variants  of  a  classical  optimization  problem. 

The  0-1  knapsack  problem  is  the  following.  A  thief  robbing  a  store  finds  n 
items.  The  zth  item  is  worth  v,  dollars  and  weighs  w,  pounds,  where  v,:  and  w;,  are 
integers.  The  thief  wants  to  take  as  valuable  a  load  as  possible,  but  he  can  carry  at 
most  W  pounds  in  his  knapsack,  for  some  integer  W.  Which  items  should  he  take? 
(We  call  this  the  0-1  knapsack  problem  because  for  each  item,  the  thief  must  either 


426 


Chapter  16  Greedy  Algorithms 


take  it  or  leave  it  behind;  he  cannot  take  a  fractional  amount  of  an  item  or  take  an 
item  more  than  once.) 

In  the  fractional  knapsack  problem,  the  setup  is  the  same,  but  the  thief  can  take 
fractions  of  items,  rather  than  having  to  make  a  binary  (0-1)  choice  for  each  item. 
You  can  think  of  an  item  in  the  0-1  knapsack  problem  as  being  like  a  gold  ingot 
and  an  item  in  the  fractional  knapsack  problem  as  more  like  gold  dust. 

Both  knapsack  problems  exhibit  the  optimal-substructure  property.  For  the  0-1 
problem,  consider  the  most  valuable  load  that  weighs  at  most  W  pounds.  If  we 
remove  item  j  from  this  load,  the  remaining  load  must  be  the  most  valuable  load 
weighing  at  most  W  —  Wj  that  the  thief  can  take  from  the  it  —  1  original  items 
excluding  j .  For  the  comparable  fractional  problem,  consider  that  if  we  remove 
a  weight  w  of  one  item  j  from  the  optimal  load,  the  remaining  load  must  be  the 
most  valuable  load  weighing  at  most  W  —  w  that  the  thief  can  take  from  the  n  —  1 
original  items  plus  Wj  —  w  pounds  of  item  j . 

Although  the  problems  are  similar,  we  can  solve  the  fractional  knapsack  problem 
by  a  greedy  strategy,  but  we  cannot  solve  the  0-1  problem  by  such  a  strategy.  To 
solve  the  fractional  problem,  we  first  compute  the  value  per  pound  v,  /  wt  for  each 
item.  Obeying  a  greedy  strategy,  the  thief  begins  by  taking  as  much  as  possible  of 
the  item  with  the  greatest  value  per  pound.  If  the  supply  of  that  item  is  exhausted 
and  he  can  still  carry  more,  he  takes  as  much  as  possible  of  the  item  with  the  next 
greatest  value  per  pound,  and  so  forth,  until  he  reaches  his  weight  limit  W.  Thus, 
by  sorting  the  items  by  value  per  pound,  the  greedy  algorithm  runs  in  O(nlgn) 
time.  We  leave  the  proof  that  the  fractional  knapsack  problem  has  the  greedy- 
choice  property  as  Exercise  16.2-1. 

To  see  that  this  greedy  strategy  does  not  work  for  the  0-1  knapsack  problem, 
consider  the  problem  instance  illustrated  in  Figure  16.2(a).  This  example  has  3 
items  and  a  knapsack  that  can  hold  50  pounds.  Item  1  weighs  10  pounds  and 
is  worth  60  dollars.  Item  2  weighs  20  pounds  and  is  worth  100  dollars.  Item  3 
weighs  30  pounds  and  is  worth  120  dollars.  Thus,  the  value  per  pound  of  item  1  is 
6  dollars  per  pound,  which  is  greater  than  the  value  per  pound  of  either  item  2  (5 
dollars  per  pound)  or  item  3  (4  dollars  per  pound).  The  greedy  strategy,  therefore, 
would  take  item  1  first.  As  you  can  see  from  the  case  analysis  in  Figure  16.2(b), 
however,  the  optimal  solution  takes  items  2  and  3,  leaving  item  1  behind.  The  two 
possible  solutions  that  take  item  1  are  both  suboptimal. 

For  the  comparable  fractional  problem,  however,  the  greedy  strategy,  which 
takes  item  1  first,  does  yield  an  optimal  solution,  as  shown  in  Figure  16.2(c).  Tak¬ 
ing  item  1  doesn’t  work  in  the  0-1  problem  because  the  thief  is  unable  to  fill  his 
knapsack  to  capacity,  and  the  empty  space  lowers  the  effective  value  per  pound  of 
his  load.  In  the  0- 1  problem,  when  we  consider  whether  to  include  an  item  in  the 
knapsack,  we  must  compare  the  solution  to  the  subproblem  that  includes  the  item 
with  the  solution  to  the  subproblem  that  excludes  the  item  before  we  can  make  the 
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Figure  16.2  An  example  showing  that  the  greedy  strategy  does  not  work  for  the  0  1  knapsack 
problem,  (a)  The  thief  must  select  a  subset  of  the  three  items  shown  whose  weight  must  not  exceed 
50  pounds,  (b)  The  optimal  subset  includes  items  2  and  3.  Any  solution  with  item  1  is  suboptimal, 
even  though  item  1  has  the  greatest  value  per  pound,  (c)  For  the  fractional  knapsack  problem,  taking 
the  items  in  order  of  greatest  value  per  pound  yields  an  optimal  solution. 

choice.  The  problem  formulated  in  this  way  gives  rise  to  many  overlapping  sub¬ 
problems— a  hallmark  of  dynamic  programming,  and  indeed,  as  Exercise  16.2-2 
asks  you  to  show,  we  can  use  dynamic  programming  to  solve  the  0-1  problem. 

Exercises 

16.2-1 

Prove  that  the  fractional  knapsack  problem  has  the  greedy-choice  property. 

16.2-2 

Give  a  dynamic- programming  solution  to  the  0-1  knapsack  problem  that  runs  in 
0(n  W)  time,  where  n  is  the  number  of  items  and  W  is  the  maximum  weight  of 
items  that  the  thief  can  put  in  his  knapsack. 

16.2- 3 

Suppose  that  in  a  0- 1  knapsack  problem,  the  order  of  the  items  when  sorted  by 
increasing  weight  is  the  same  as  their  order  when  sorted  by  decreasing  value.  Give 
an  efficient  algorithm  to  find  an  optimal  solution  to  this  variant  of  the  knapsack 
problem,  and  argue  that  your  algorithm  is  correct. 

16.2- 4 

Professor  Gekko  has  always  dreamed  of  inline  skating  across  North  Dakota.  He 
plans  to  cross  the  state  on  highway  U.S.  2,  which  runs  from  Grand  Forks,  on  the 
eastern  border  with  Minnesota,  to  Williston,  near  the  western  border  with  Montana. 
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The  professor  can  carry  two  liters  of  water,  and  he  can  skate  m  miles  before  running 
out  of  water.  (Because  North  Dakota  is  relatively  flat,  the  professor  does  not  have 
to  worry  about  drinking  water  at  a  greater  rate  on  uphill  sections  than  on  flat  or 
downhill  sections.)  The  professor  will  stall  in  Grand  Forks  with  two  full  liters  of 
water.  His  official  North  Dakota  state  map  shows  all  the  places  along  U.S.  2  at 
which  he  can  refill  his  water  and  the  distances  between  these  locations. 

The  professor’s  goal  is  to  minimize  the  number  of  water  stops  along  his  route 
across  the  state.  Give  an  efficient  method  by  which  he  can  determine  which  water 
stops  he  should  make.  Prove  that  your  strategy  yields  an  optimal  solution,  and  give 
its  running  time. 


16.2- 5 

Describe  an  efficient  algorithm  that,  given  a  set  {x\,x2, . . .  ,xn}  of  points  on  the 
real  line,  determines  the  smallest  set  of  unit-length  closed  intervals  that  contains 
all  of  the  given  points.  Argue  that  your  algorithm  is  correct. 

16.2- 6  * 

Show  how  to  solve  the  fractional  knapsack  problem  in  0(n )  time. 


16.2-7 

Suppose  you  are  given  two  sets  A  and  B,  each  containing  n  positive  integers.  You 
can  choose  to  reorder  each  set  however  you  like.  After  reordering,  let  a,  be  the  i  th 
element  of  set  A,  and  let  b ,•  be  the  z th  element  of  set  B.  You  then  receive  a  payoff 
°f  rr=i  at*.  Give  an  algorithm  that  will  maximize  your  payoff.  Prove  that  your 
algorithm  maximizes  the  payoff,  and  state  its  running  time. 


16.3  Huffman  codes 

Huffman  codes  compress  data  very  effectively:  savings  of  20%  to  90%  are  typical, 
depending  on  the  characteristics  of  the  data  being  compressed.  We  consider  the 
data  to  be  a  sequence  of  characters.  Huffman’s  greedy  algorithm  uses  a  table  giving 
how  often  each  character  occurs  (i.e.,  its  frequency)  to  build  up  an  optimal  way  of 
representing  each  character  as  a  binary  string. 

Suppose  we  have  a  100,000-character  data  file  that  we  wish  to  store  compactly. 
We  observe  that  the  characters  in  the  file  occur  with  the  frequencies  given  by  Fig¬ 
ure  16.3.  That  is,  only  6  different  characters  appear,  and  the  character  a  occurs 
45,000  times. 

We  have  many  options  for  how  to  represent  such  a  file  of  information.  Here, 
we  consider  the  problem  of  designing  a  binary  character  code  (or  code  for  short) 
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Frequency  (in  thousands) 

45 
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Fixed  length  codeword 

000 

001 
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Variable  length  codeword 

0 

101 

100 
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1101 

1100 

Figure  16.3  A  character  coding  problem.  A  data  file  of  100,000  characters  contains  only  the  char 
acters  a  f,  with  the  frequencies  indicated.  If  we  assign  each  character  a  3  bit  codeword,  we  can 
encode  the  file  in  300,000  bits.  Using  the  variable  length  code  shown,  we  can  encode  the  file  in  only 
224,000  bits. 

in  which  each  character  is  represented  by  a  unique  binary  string,  which  we  call  a 
codeword.  If  we  use  a  fixed-length  code ,  we  need  3  bits  to  represent  6  characters: 
a  =  000,  b  =  001,  . . . ,  f  =  101.  This  method  requires  300,000  bits  to  code  the 
entire  file.  Can  we  do  better? 

A  variable-length  code  can  do  considerably  better  than  a  fixed-length  code,  by 
giving  frequent  characters  short  codewords  and  infrequent  characters  long  code¬ 
words.  Figure  16.3  shows  such  a  code;  here  the  1-bit  string  0  represents  a,  and  the 
4-bit  string  1100  represents  f .  This  code  requires 

(45-1  +  13-3  +  12-3  +  16-3  +  9  •  4  +  5  -  4)  -  1,000  =  224,000  bits 

to  represent  the  file,  a  savings  of  approximately  25%.  In  fact,  this  is  an  optimal 
character  code  for  this  file,  as  we  shall  see. 

Prefix  codes 

We  consider  here  only  codes  in  which  no  codeword  is  also  a  prefix  of  some  other 
codeword.  Such  codes  are  called  prefix  codes?  Although  we  won’t  prove  it  here,  a 
prefix  code  can  always  achieve  the  optimal  data  compression  among  any  character 
code,  and  so  we  suffer  no  loss  of  generality  by  restricting  our  attention  to  prefix 
codes. 

Encoding  is  always  simple  for  any  binary  character  code;  we  just  concatenate  the 
codewords  representing  each  character  of  the  file.  For  example,  with  the  variable- 
length  prefix  code  of  Figure  16.3,  we  code  the  3-character  file  abc  as  0- 101  •  100  = 
0101100,  where  denotes  concatenation. 

Prefix  codes  are  desirable  because  they  simplify  decoding.  Since  no  codeword 
is  a  prefix  of  any  other,  the  codeword  that  begins  an  encoded  file  is  unambiguous. 
We  can  simply  identify  the  initial  codeword,  translate  it  back  to  the  original  char- 


3Perhaps  “prefix  free  codes”  would  be  a  better  name,  but  the  term  “prefix  codes”  is  standard  in  the 
literature. 
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Figure  16.4  Trees  corresponding  to  the  coding  schemes  in  Figure  16.3.  Each  leaf  is  labeled  with 
a  character  and  its  frequency  of  occurrence.  Each  internal  node  is  labeled  with  the  sum  of  the  ffe 
quencies  of  the  leaves  in  its  subtree,  (a)  The  tree  corresponding  to  the  fixed  length  code  a  =  000, .... 
f  =  101.  (b)  The  tree  corresponding  to  the  optimal  prefix  code  a  =  0,  b  =  101, . . . ,  f  =  1100. 

acter,  and  repeat  the  decoding  process  on  the  remainder  of  the  encoded  file.  In  our 
example,  the  string  001011101  parses  uniquely  asO-01011101,  which  decodes 
to  aabe. 

The  decoding  process  needs  a  convenient  representation  for  the  prefix  code  so 
that  we  can  easily  pick  off  the  initial  codeword.  A  binary  tree  whose  leaves  are 
the  given  characters  provides  one  such  representation.  We  interpret  the  binary 
codeword  for  a  character  as  the  simple  path  from  the  root  to  that  character,  where  0 
means  “go  to  the  left  child”  and  1  means  “go  to  the  right  child.”  Figure  1 6.4  shows 
the  trees  for  the  two  codes  of  our  example.  Note  that  these  are  not  binary  search 
trees,  since  the  leaves  need  not  appear  in  sorted  order  and  internal  nodes  do  not 
contain  character  keys. 

An  optimal  code  for  a  file  is  always  represented  by  a  full  binary  tree,  in  which 
every  nonleaf  node  has  two  children  (see  Exercise  16.3-2).  The  fixed-length  code 
in  our  example  is  not  optimal  since  its  tree,  shown  in  Figure  16.4(a),  is  not  a  full  bi¬ 
nary  tree:  it  contains  codewords  beginning  10. . . ,  but  none  beginning  11 _ Since 

we  can  now  restrict  our  attention  to  full  binary  trees,  we  can  say  that  if  C  is  the 
alphabet  from  which  the  characters  are  drawn  and  all  character  frequencies  are  pos¬ 
itive,  then  the  tree  for  an  optimal  prefix  code  has  exactly  | C  |  leaves,  one  for  each 
letter  of  the  alphabet,  and  exactly  |C|  —  1  internal  nodes  (see  Exercise  B.5-3). 

Given  a  tree  T  corresponding  to  a  prefix  code,  we  can  easily  compute  the  number 
of  bits  required  to  encode  a  file.  For  each  character  c  in  the  alphabet  C,  let  the 
attribute  c.freq  denote  the  frequency  of  c  in  the  file  and  let  dj{c)  denote  the  depth 
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of  c’s  leaf  in  the  tree.  Note  that  dT(c)  is  also  the  length  of  the  codeword  for 
character  c.  The  number  of  bits  required  to  encode  a  file  is  thus 


(16.4) 


which  we  define  as  the  cost  of  the  tree  T . 

Constructing  a  Huffman  code 

Huffman  invented  a  greedy  algorithm  that  constructs  an  optimal  prefix  code  called 
a  Huffman  code.  In  line  with  our  observations  in  Section  16.2,  its  proof  of  cor¬ 
rectness  relies  on  the  greedy-choice  property  and  optimal  substructure.  Rather 
than  demonstrating  that  these  properties  hold  and  then  developing  pseudocode,  we 
present  the  pseudocode  first.  Doing  so  will  help  clarify  how  the  algorithm  makes 
greedy  choices. 

In  the  pseudocode  that  follows,  we  assume  that  C  is  a  set  of  n  characters  and 
that  each  character  c  e  C  is  an  object  with  an  attribute  c  .freq  giving  its  frequency. 
The  algorithm  builds  the  tree  T  corresponding  to  the  optimal  code  in  a  bottom-up 
manner.  It  begins  with  a  set  of  |  C  |  leaves  and  performs  a  sequence  of  |  C  |  —  1 
“merging”  operations  to  create  the  final  tree.  The  algorithm  uses  a  min-priority 
queue  Q,  keyed  on  the  freq  attribute,  to  identify  the  two  least-frequent  objects  to 
merge  together.  When  we  merge  two  objects,  the  result  is  a  new  object  whose 
frequency  is  the  sum  of  the  frequencies  of  the  two  objects  that  were  merged. 


Huffman(C) 

1  n  =  \C\ 

2  Q  =  C 

3  for  i  =  1  to  n  —  1 

4  allocate  a  new  node  z 

5  z.left  =  x  =  EXTRACT-MlN(g) 

6  z. right  =  y  =  Extract-Min  (Q) 

7  Z-fi'eq  =  x.freq  +  y.freq 


8  Insert(2,  z) 

9  return  EXTRACT-MlN(g)  // return  the  root  of  the  tree 

For  our  example,  Huffman’s  algorithm  proceeds  as  shown  in  Figure  16.5.  Since 
the  alphabet  contains  6  letters,  the  initial  queue  size  is  n  =  6,  and  5  merge  steps 
build  the  tree.  The  final  tree  represents  the  optimal  prefix  code.  The  codeword  for 
a  letter  is  the  sequence  of  edge  labels  on  the  simple  path  from  the  root  to  the  letter. 

Line  2  initializes  the  min-priority  queue  Q  with  the  characters  in  C .  The  for 
loop  in  lines  3-8  repeatedly  extracts  the  two  nodes  x  and  y  of  lowest  frequency 
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Figure  16.5  The  steps  of  Huffman’s  algorithm  for  the  frequencies  given  in  Figure  16.3.  Each  part 
shows  the  contents  of  the  queue  sorted  into  increasing  order  by  frequency.  At  each  step,  the  two 
trees  with  lowest  frequencies  are  merged.  Leaves  are  shown  as  rectangles  containing  a  character 
and  its  frequency.  Internal  nodes  are  shown  as  circles  containing  the  sum  of  the  frequencies  of  their 
children.  An  edge  connecting  an  internal  node  with  its  children  is  labeled  0  if  it  is  an  edge  to  a  left 
child  and  1  if  it  is  an  edge  to  a  right  child.  The  codeword  for  a  letter  is  the  sequence  of  labels  on  the 
edges  connecting  the  root  to  the  leaf  for  that  letter,  (a)  The  initial  set  of  n  =  6  nodes,  one  for  each 
letter,  (b)  (e)  Intermediate  stages.  (!)  The  final  tree. 


from  the  queue,  replacing  them  in  the  queue  with  a  new  node  z  representing  their 
merger.  The  frequency  of  z  is  computed  as  the  sum  of  the  frequencies  of  x  and  y 
in  line  7.  The  node  z  has  x  as  its  left  child  and  y  as  its  right  child.  (This  order  is 
arbitrary;  switching  the  left  and  right  child  of  any  node  yields  a  different  code  of 
the  same  cost.)  After  n  —  1  mergers,  line  9  returns  the  one  node  left  in  the  queue, 
which  is  the  root  of  the  code  tree. 

Although  the  algorithm  would  produce  the  same  result  if  we  were  to  excise  the 
variables  x  and  y  — assigning  directly  to  z-left  and  z- right  in  lines  5  and  6,  and 
changing  line  7  to  Z-freq  =  z-left.freq  +  z-right.freq— we  shall  use  the  node 
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names  x  and  y  in  the  proof  of  correctness.  Therefore,  we  find  it  convenient  to 
leave  them  in. 

To  analyze  the  running  time  of  Huffman’s  algorithm,  we  assume  that  Q  is  im¬ 
plemented  as  a  binary  min-heap  (see  Chapter  6).  For  a  set  C  of  n  characters,  we 
can  initialize  Q  in  line  2  in  0(n )  time  using  the  Build-Min-Heap  procedure  dis¬ 
cussed  in  Section  6.3.  The  for  loop  in  lines  3-8  executes  exactly  n  —  1  times,  and 
since  each  heap  operation  requires  time  0(\gn),  the  loop  contributes  0(n  Ig  n )  to 
the  running  time.  Thus,  the  total  running  time  of  Huffman  on  a  set  of  n  charac¬ 
ters  is  0(n  lg  n).  We  can  reduce  the  running  time  to  0(n  Ig  Ig  n)  by  replacing  the 
binary  min-heap  with  a  van  Emde  Boas  tree  (see  Chapter  20). 

Correctness  of  Huffman’s  algorithm 

To  prove  that  the  greedy  algorithm  Huffman  is  correct,  we  show  that  the  prob¬ 
lem  of  determining  an  optimal  prefix  code  exhibits  the  greedy-choice  and  optimal- 
substructure  properties.  The  next  lemma  shows  that  the  greedy-choice  property 
holds. 

Lemma  16.2 

Let  C  be  an  alphabet  in  which  each  character  c  e  C  has  frequency  c.freq.  Let 
x  and  y  be  two  characters  in  C  having  the  lowest  frequencies.  Then  there  exists 
an  optimal  prefix  code  for  C  in  which  the  codewords  for  x  and  y  have  the  same 
length  and  differ  only  in  the  last  bit. 

Proof  The  idea  of  the  proof  is  to  take  the  tree  T  representing  an  arbitrary  optimal 
prefix  code  and  modify  it  to  make  a  tree  representing  another  optimal  prefix  code 
such  that  the  characters  x  and  y  appear  as  sibling  leaves  of  maximum  depth  in  the 
new  tree.  If  we  can  construct  such  a  tree,  then  the  codewords  for  x  and  y  will  have 
the  same  length  and  differ  only  in  the  last  bit. 

Let  a  and  b  be  two  characters  that  are  sibling  leaves  of  maximum  depth  in  T . 
Without  loss  of  generality,  we  assume  that  a.freq  <  b.freq  and  x.freq  <  y.freq. 
Since  x.freq  and  y.freq  are  the  two  lowest  leaf  frequencies,  in  order,  and  a.freq 
and  b.freq  are  two  arbitrary  frequencies,  in  order,  we  have  x.freq  <  a.freq  and 
y.freq  <  b.freq. 

In  the  remainder  of  the  proof,  it  is  possible  that  we  could  have  x.freq  =  a.freq 
or  y.freq  =  b.freq.  However,  if  we  had  x.freq  =  b.freq ,  then  we  would  also  have 
a.freq  =  b.freq  =  x.freq  —  y.freq  (see  Exercise  16.3-1),  and  the  lemma  would 
be  trivially  true.  Thus,  we  will  assume  that  x.freq  f  b.freq,  which  means  that 

X  b. 

As  Ligure  16.6  shows,  we  exchange  the  positions  in  T  of  a  and  x  to  produce  a 
tree  T' ,  and  then  we  exchange  the  positions  in  T'  of  b  and  y  to  produce  a  tree  T" 
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Figure  16.6  An  illustration  of  the  key  step  in  the  proof  of  Lemma  16.2.  In  the  optimal  tree  T , 
leaves  a  and  b  are  two  siblings  of  maximum  depth.  Leaves  x  and  y  are  the  two  characters  with  the 
lowest  frequencies;  they  appear  in  arbitrary  positions  in  T .  Assuming  that  x  f  b,  swapping  leaves  a 
and  x  produces  tree  T' ,  and  then  swapping  leaves  b  and  y  produces  tree  T" .  Sinoe  each  swap  does 
not  increase  the  cost,  the  resulting  tree  T"  is  also  an  optimal  tree. 


in  which  x  and  y  are  sibling  leaves  of  maximum  depth.  (Note  that  if  x  —  b  but 
y  ^  a,  then  tree  T"  does  not  have  x  and  y  as  sibling  leaves  of  maximum  depth. 
Because  we  assume  that  x  /  b,  this  situation  cannot  occur.)  By  equation  (16.4), 
the  difference  in  cost  between  T  and  T'  is 

B(T)-B(T') 

=  C  ^  -  dT  (c)  -  X]  C ■freq  '  dT'(c) 

ceC  ceC 

=  x.freq  •  dp (x)  a.freq  •  dp  (a)  —  x.freq  •  dT>(x)  —  a. freq  •  dp/ (a) 

—  x.freq  •  dj(x)  +  a.freq  •  dp  (a)  —  x.freq  ■  dp  {a)  —  a.freq  ■  dp(x) 

=  (a.freq -x.freq)(dp(a)~  dp (x)) 

>  0, 

because  both  a.freq  —  x.freq  and  dp  (a)  —  dp(x)  are  nonnegative.  More  specifi¬ 
cally,  a.freq  —  x.freq  is  nonnegative  because  x  is  a  minimum-frequency  leaf,  and 
dp  (a)— dp  (x)  is  nonnegative  because  a  is  a  leaf  of  maximum  depth  in  T.  Similarly, 
exchanging  y  and  b  does  not  increase  the  cost,  and  so  B(T')  —  B(T")  is  nonnega¬ 
tive.  Therefore,  B(T")  <  B(T),  and  since  T  is  optimal,  we  have  B(T )  <  B(T "), 
which  implies  B(T ")  =  B(T).  Thus,  T"  is  an  optimal  tree  in  which  x  and  y 
appear  as  sibling  leaves  of  maximum  depth,  from  which  the  lemma  follows.  ■ 

Lemma  16.2  implies  that  the  process  of  building  up  an  optimal  tree  by  mergers 
can,  without  loss  of  generality,  begin  with  the  greedy  choice  of  merging  together 
those  two  characters  of  lowest  frequency.  Why  is  this  a  greedy  choice?  We  can 
view  the  cost  of  a  single  merger  as  being  the  sum  of  the  frequencies  of  the  two  items 
being  merged.  Exercise  16.3-4  shows  that  the  total  cost  of  the  tree  constructed 
equals  the  sum  of  the  costs  of  its  mergers.  Of  all  possible  mergers  at  each  step, 
HUFFMAN  chooses  the  one  that  incurs  the  least  cost. 
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The  next  lemma  shows  that  the  problem  of  constructing  optimal  prefix  codes  has 
the  optimal-substructure  property. 

Lemma  16.3 

Let  C  be  a  given  alphabet  with  frequency  c  .freq  defined  for  each  character  c  e  C. 
Let  x  and  y  be  two  characters  in  C  with  minimum  frequency.  Let  C'  be  the 
alphabet  C  with  the  characters  x  and  y  removed  and  a  new  character  z  added, 
so  that  C'  =  C  —  {x,  y}  U  {z}.  Define  /  for  C'  as  for  C,  except  that 
Z.freq  —  x.freq  +  y .freq.  Let  T'  be  any  tree  representing  an  optimal  prefix  code 
for  the  alphabet  C' .  Then  the  tree  T ,  obtained  from  T'  by  replacing  the  leaf  node 
for  z  with  an  internal  node  having  x  and  v  as  children,  represents  an  optimal  prefix 
code  for  the  alphabet  C . 

Proof  We  first  show  how  to  express  the  cost  B(T)  of  tree  T  in  terms  of  the 
cost  B{T')  of  tree  T' ,  by  considering  the  component  costs  in  equation  (16.4). 
For  each  character  c  €  C  —  {x,y},  we  have  that  dj(c)  =  drfc),  and  hence 
c.freq  ■  dT(c)  =  c.freq  ■  drfc).  Since  dT(x)  =  dT(y)  =  dr'(z)  +  1,  we  have 

x.freq  ■  dT(x)  +  y.freq  ■  dT(y)  =  ( x.freq  +  y,/req)(dT>(z)  +  1) 

=  z  .freq  ■  dT,  (z)  +  (x  .freq  +  y  .freq)  , 

from  which  we  conclude  that 
B(T)  =  B(T')  +  x.freq  +  y.freq 
or,  equivalently, 

B(T')  =  B(T)  -  x.freq  -  y.freq  . 

We  now  prove  the  lemma  by  contradiction.  Suppose  that  T  does  not  repre¬ 
sent  an  optimal  prefix  code  for  C .  Then  there  exists  an  optimal  tree  T"  such  that 
B(T ")  <  B(T).  Without  loss  of  generality  (by  Lemma  16.2),  T"  has  x  and  y  as 
siblings.  Let  T'"  be  the  tree  T"  with  the  common  parent  of  x  and  y  replaced  by  a 
leaf  z  with  frequency  z.freq  =  x.freq  +  y.freq.  Then 

B(Tm)  =  B(T")~  x.freq  -y.freq 

<  B(T)  —  x.freq  —  y.freq 
=  B(T')  , 

yielding  a  contradiction  to  the  assumption  that  T’  represents  an  optimal  prefix  code 
for  C' .  Thus,  T  must  represent  an  optimal  prefix  code  for  the  alphabet  C .  m 

Theorem  16.4 

Procedure  Huffman  produces  an  optimal  prefix  code. 


Proof  Immediate  from  Lemmas  16.2  and  16.3. 
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Exercises 


16.3- 1 

Explain  why,  in  the  proof  of  Lemma  16.2,  if  x.freq  =  b.freq,  then  we  must  have 
a.freq  =  b.freq  —  x.freq  =  y.freq. 

16.3- 2 

Prove  that  a  binary  tree  that  is  not  full  cannot  correspond  to  an  optimal  prefix  code. 


16.3-3 

What  is  an  optimal  Huffman  code  for  the  following  set  of  frequencies,  based  on 
the  first  8  Fibonacci  numbers? 

a:l  b:l  c:2  d:3  e:5  f:8  g:13  h:21 

Can  you  generalize  your  answer  to  find  the  optimal  code  when  the  frequencies  are 
the  first  n  Fibonacci  numbers? 


16.3-4 

Prove  that  we  can  also  express  the  total  cost  of  a  tree  for  a  code  as  the  sum,  over 
all  internal  nodes,  of  the  combined  frequencies  of  the  two  children  of  the  node. 


16.3-5 

Prove  that  if  we  order  the  characters  in  an  alphabet  so  that  their  frequencies 
are  monotonically  decreasing,  then  there  exists  an  optimal  code  whose  codeword 
lengths  are  monotonically  increasing. 


16.3-6 

Suppose  we  have  an  optimal  prefix  code  on  a  set  C  =  {0,  1, . . . ,  n  —  1}  of  charac¬ 
ters  and  we  wish  to  transmit  this  code  using  as  few  bits  as  possible.  Show  how  to 
represent  any  optimal  prefix  code  on  C  using  only  In  —  1  +  n  |"lg n]  bits.  {Hint: 
Use  2 n  —  1  bits  to  specify  the  structure  of  the  tree,  as  discovered  by  a  walk  of  the 
tree.) 


16.3-7 

Generalize  Huffman’s  algorithm  to  ternary  codewords  (i.e.,  codewords  using  the 
symbols  0,  1,  and  2),  and  prove  that  it  yields  optimal  ternary  codes. 


16.3-8 

Suppose  that  a  data  file  contains  a  sequence  of  8-bit  characters  such  that  all  256 
characters  are  about  equally  common:  the  maximum  character  frequency  is  less 
than  twice  the  minimum  character  frequency.  Prove  that  Huffman  coding  in  this 
case  is  no  more  efficient  than  using  an  ordinary  8-bit  fixed-length  code. 
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16.3-9 

Show  that  no  compression  scheme  can  expect  to  compress  a  file  of  randomly  cho¬ 
sen  8-bit  characters  by  even  a  single  bit.  {Hint:  Compare  the  number  of  possible 
files  with  the  number  of  possible  encoded  files.) 


★  16.4  Matroids  and  greedy  methods 

In  this  section,  we  sketch  a  beautiful  theory  about  greedy  algorithms.  This  theory 
describes  many  situations  in  which  the  greedy  method  yields  optimal  solutions.  It 
involves  combinatorial  structures  known  as  “matroids.”  Although  this  theory  does 
not  cover  all  cases  for  which  a  greedy  method  applies  (for  example,  it  does  not 
cover  the  activity-selection  problem  of  Section  16.1  or  the  Huffman-coding  prob¬ 
lem  of  Section  16.3),  it  does  cover  many  cases  of  practical  interest.  Furthermore, 
this  theory  has  been  extended  to  cover  many  applications;  see  the  notes  at  the  end 
of  this  chapter  for  references. 

Matroids 

A  matroid  is  an  ordered  pair  M  =  ( S ,  I)  satisfying  the  following  conditions. 

1 .  S'  is  a  finite  set. 

2.  J  is  a  nonempty  family  of  subsets  of  S,  called  the  independent  subsets  of  S, 
such  that  if  Bel  and  A  C  B,  then  A  €  I.  We  say  that  I  is  hereditary  if  it 
satisfies  this  property.  Note  that  the  empty  set  0  is  necessarily  a  member  of  I. 

3.  If  A  e  I,  B  e  J,  and  \A\  <  |  B  | ,  then  there  exists  some  element  x  €  B  —  A 
such  that  A  U  {x\  e  J.  We  say  that  M  satisfies  the  exchange  property. 

The  word  “matroid”  is  due  to  Hassler  Whitney.  He  was  studying  matric  ma¬ 
troids,  in  which  the  elements  of  S  are  the  rows  of  a  given  matrix  and  a  set  of  rows  is 
independent  if  they  are  linearly  independent  in  the  usual  sense.  As  Exercise  16.4-2 
asks  you  to  show,  this  structure  defines  a  matroid. 

As  another  example  of  matroids,  consider  the  graphic  matroid  MG  =  (SG,  Ig) 
defined  in  terms  of  a  given  undirected  graph  G  —  (  V.  E)  as  follows: 

•  The  set  SG  is  defined  to  be  E,  the  set  of  edges  of  G. 

•  If  A  is  a  subset  of  E,  then  A  €  IG  if  and  only  if  A  is  acyclic.  That  is,  a  set  of 
edges  A  is  independent  if  and  only  if  the  subgraph  Ga  =  ( V ,  A)  forms  a  forest. 

The  graphic  matroid  MG  is  closely  related  to  the  minimum-spanning-tree  problem, 
which  Chapter  23  covers  in  detail. 
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Theorem  16.5 

If  G  =  (V,  E)  is  an  undirected  graph,  then  MG  =  (SG,  Ig)  is  a  matroid. 

Proof  Clearly,  SG  —  E  is  a  finite  set.  Furthermore,  IG  is  hereditary,  since  a 
subset  of  a  forest  is  a  forest.  Putting  it  another  way,  removing  edges  from  an 
acyclic  set  of  edges  cannot  create  cycles. 

Thus,  it  remains  to  show  that  MG  satisfies  the  exchange  property.  Suppose  that 
Ga  =  (V,  A)  and  Gp  =  (F,  B)  are  forests  of  G  and  that  |  B \  >  \A\.  That  is,  A 
and  B  are  acyclic  sets  of  edges,  and  B  contains  more  edges  than  A  does. 

We  claim  that  a  forest  F  =  (Vf.  Ep)  contains  exactly  \Vf\  —  \Ep\  trees.  To 
see  why,  suppose  that  F  consists  of  t  trees,  where  the  i  th  tree  contains  v,-  vertices 
and  e,  edges.  Then,  we  have 

t 

\£p\  =  Ylet 

t 

=  J>,  —  1)  (by  Theorem  B.2) 

i  =  l 

=  E":-< 

i  =  1 

=  \Vp\-t, 

which  implies  that  t  =  \VF  \  —  \  E  F\.  Thus,  forest  Ga  contains  \  V\  —  \A\  trees,  and 
forest  G b  contains  \  V\  —  j  B\  trees. 

Since  forest  GB  has  fewer  trees  than  forest  Ga  does,  forest  Gp  must  contain 
some  tree  T  whose  vertices  are  in  two  different  trees  in  forest  Ga-  Moreover, 
since  T  is  connected,  it  must  contain  an  edge  (u.v)  such  that  vertices  u  and  v 
are  in  different  trees  in  forest  Ga-  Since  the  edge  (it.  v)  connects  vertices  in  two 
different  trees  in  forest  Ga,  we  can  add  the  edge  (w,  v)  to  forest  G,\  without  creating 
a  cycle.  Therefore,  MG  satisfies  the  exchange  property,  completing  the  proof  that 
Mg  is  a  matroid.  ■ 

Given  a  matroid  M  =  (S.  J),  we  call  an  element  x  £  A  an  extension  of  A  e  I 
if  we  can  add  x  to  A  while  preserving  independence;  that  is,  x  is  an  extension 
of  A  if  A  U  {x\  €  I .  As  an  example,  consider  a  graphic  matroid  MG.  If  A  is  an 
independent  set  of  edges,  then  edge  e  is  an  extension  of  A  if  and  only  if  e  is  not 
in  A  and  the  addition  of  e  to  A  does  not  create  a  cycle. 

If  A  is  an  independent  subset  in  a  matroid  M,  we  say  that  A  is  maximal  if  it  has 
no  extensions.  That  is,  A  is  maximal  if  it  is  not  contained  in  any  larger  independent 
subset  of  M .  The  following  property  is  often  useful. 
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Theorem  16.6 

All  maximal  independent  subsets  in  a  matroid  have  the  same  size. 

Proof  Suppose  to  the  contrary  that  A  is  a  maximal  independent  subset  of  M 
and  there  exists  another  larger  maximal  independent  subset  B  of  M .  Then,  the 
exchange  property  implies  that  for  some  x  €  B  —  A,  we  can  extend  A  to  a  larger 
independent  set  A  U  {x},  contradicting  the  assumption  that  A  is  maximal.  ■ 

As  an  illustration  of  this  theorem,  consider  a  graphic  matroid  MG  for  a  con¬ 
nected,  undirected  graph  G.  Every  maximal  independent  subset  of  MG  must  be  a 
free  tree  with  exactly  |  V\  —  1  edges  that  connects  all  the  vertices  of  G.  Such  a  tree 
is  called  a  spanning  tree  of  G. 

We  say  that  a  matroid  M  =  (5, 1)  is  weighted  if  it  is  associated  with  a  weight 
function  w  that  assigns  a  strictly  positive  weight  w(x)  to  each  element  x  €  S.  The 
weight  function  w  extends  to  subsets  of  S  by  summation: 

w(A)  =  ^  w(x) 

xeA 

for  any  icy  For  example,  if  we  let  w(e)  denote  the  weight  of  an  edge  e  in  a 
graphic  matroid  MG,  then  w  ( A )  is  the  total  weight  of  the  edges  in  edge  set  A. 

Greedy  algorithms  on  a  weighted  matroid 

Many  problems  for  which  a  greedy  approach  provides  optimal  solutions  can  be  for¬ 
mulated  in  terms  of  finding  a  maximum-weight  independent  subset  in  a  weighted 
matroid.  That  is,  we  are  given  a  weighted  matroid  M  =  (S.  J),  and  we  wish  to 
find  an  independent  set  A  e  I  such  that  w ( A )  is  maximized.  We  call  such  a  sub¬ 
set  that  is  independent  and  has  maximum  possible  weight  an  optimal  subset  of  the 
matroid.  Because  the  weight  w(x)  of  any  element  x  €  S  is  positive,  an  optimal 
subset  is  always  a  maximal  independent  subset— it  always  helps  to  make  A  as  large 
as  possible. 

For  example,  in  the  minimum-spanning-tree problem ,  we  are  given  a  connected 
undirected  graph  G  =  iV.E)  and  a  length  function  w  such  that  w(e)  is  the  (posi¬ 
tive)  length  of  edge  e.  (We  use  the  term  “length”  here  to  refer  to  the  original  edge 
weights  for  the  graph,  reserving  the  term  “weight”  to  refer  to  the  weights  in  the 
associated  matroid.)  We  wish  to  find  a  subset  of  the  edges  that  connects  all  of 
the  vertices  together  and  has  minimum  total  length.  To  view  this  as  a  problem  of 
finding  an  optimal  subset  of  a  matroid,  consider  the  weighted  matroid  MG  with 
weight  function  w',  where  w'(e)  =  w0  —  w(e)  and  w0  is  larger  than  the  maximum 
length  of  any  edge.  In  this  weighted  matroid,  all  weights  are  positive  and  an  opti¬ 
mal  subset  is  a  spanning  tree  of  minimum  total  length  in  the  original  graph.  More 
specifically,  each  maximal  independent  subset  A  corresponds  to  a  spanning  tree 
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with  |  V  |  —  1  edges,  and  since 
w'(A)  =  ^w'(e) 

eeA 

=  -  w(e)) 

e€A 

=  (\V\-  l)w0-^w(e) 

e&A 

=  (\V\-  l)wo-w(A) 

for  any  maximal  independent  subset  A,  an  independent  subset  that  maximizes  the 
quantity  w'(A )  must  minimize  w(A).  Thus,  any  algorithm  that  can  find  an  optimal 
subset  A  in  an  arbitrary  matroid  can  solve  the  minimum-spanning-tree  problem. 

Chapter  23  gives  algorithms  for  the  minimum-spanning-tree  problem,  but  here 
we  give  a  greedy  algorithm  that  works  for  any  weighted  matroid.  The  algorithm 
takes  as  input  a  weighted  matroid  M  =  (S, 1)  with  an  associated  positive  weight 
function  w,  and  it  returns  an  optimal  subset  A.  In  our  pseudocode,  we  denote  the 
components  of  M  by  M.S  and  M.  I  and  the  weight  function  by  w.  The  algorithm 
is  greedy  because  it  considers  in  turn  each  element  x  €  S,  in  order  of  monotoni- 
cally  decreasing  weight,  and  immediately  adds  it  to  the  set  A  being  accumulated  if 
A  U  {x}  is  independent. 

Greedy(M,  w) 

1  A  =  0 

2  sort  M.  S  into  monotonically  decreasing  order  by  weight  w 

3  for  each  x  e  M.S,  taken  in  monotonically  decreasing  order  by  weight  w (x ) 

4  if  A  U{xj  e  M.J 

5  A  =  A  U  {x} 

6  return  A 

Line  4  checks  whether  adding  each  element  x  to  A  would  maintain  A  as  an  inde¬ 
pendent  set.  If  A  would  remain  independent,  then  line  5  adds  x  to  A.  Otherwise,  x 
is  discarded.  Since  the  empty  set  is  independent,  and  since  each  iteration  of  the  for 
loop  maintains  ,4's  independence,  the  subset  A  is  always  independent,  by  induc¬ 
tion.  Therefore,  Greedy  always  returns  an  independent  subset  A.  We  shall  see  in 
a  moment  that  A  is  a  subset  of  maximum  possible  weight,  so  that  A  is  an  optimal 
subset. 

The  running  time  of  Greedy  is  easy  to  analyze.  Let  n  denote  |  S  \ .  The  sorting 
phase  of  Greedy  takes  time  0(n  lg  n).  Line  4  executes  exactly  n  times,  once  for 
each  element  of  S.  Each  execution  of  line  4  requires  a  check  on  whether  or  not 
the  set  A  U  {x}  is  independent.  If  each  such  check  takes  time  0(f(n)),  the  entire 
algorithm  runs  in  time  0(n  lg  n  +«/(«)). 
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We  now  prove  that  Greedy  returns  an  optimal  subset. 

Lemma  16.7  ( Matroids  exhibit  the  greedy-choice  property) 

Suppose  that  M  —  (S.  I)  is  a  weighted  matroid  with  weight  function  w  and  that  S 
is  sorted  into  monotonically  decreasing  order  by  weight.  Let  x  be  the  first  element 
of  5  such  that  \x  )  is  independent,  if  any  such  x  exists.  If  x  exists,  then  there  exists 
an  optimal  subset  A  of  S  that  contains  x. 

Proof  If  no  such  x  exists,  then  the  only  independent  subset  is  the  empty  set  and 
the  lemma  is  vacuously  true.  Otherwise,  let  B  be  any  nonempty  optimal  subset. 
Assume  that  x  £  B\  otherwise,  letting  A  =  B  gives  an  optimal  subset  of  S  that 
contains  x. 

No  element  of  B  has  weight  greater  than  w(x).  To  see  why,  observe  that  y  €  B 
implies  that  {y}  is  independent,  since  B  €  I  and  I  is  hereditary.  Our  choice  of  x 
therefore  ensures  that  w(x)  >  w(v)  for  any  y  e  B. 

Construct  the  set  A  as  follows.  Begin  with  A  =  {x}.  By  the  choice  of  x,  set  A  is 
independent.  Using  the  exchange  property,  repeatedly  find  a  new  element  of  B  that 
we  can  add  to  A  until  \A\  =  |B|,  while  preserving  the  independence  of  A.  At  that 
point,  A  and  B  are  the  same  except  that  A  has  x  and  B  has  some  other  element  y. 
That  is,  A  =  B  —  {y}  U  {x}  for  some  y  €  B,  and  so 

w(A)  =  w(B)  —  w(y)  +  w(x) 

>  w(B) . 

Because  set  B  is  optimal,  set  A,  which  contains  x,  must  also  be  optimal.  ■ 

We  next  show  that  if  an  element  is  not  an  option  initially,  then  it  cannot  be  an 
option  later. 

Lemma  16.8 

Let  M  =  (S. 1)  be  any  matroid.  If  x  is  an  element  of  S  that  is  an  extension  of 
some  independent  subset  A  of  S,  then  x  is  also  an  extension  of  0. 

Proof  Since  x  is  an  extension  of  A,  we  have  that  A  U  {x}  is  independent.  Since  I 
is  hereditary,  {x}  must  be  independent.  Thus,  x  is  an  extension  of  0.  ■ 

Corollary  16.9 

Let  M  =  ( S ,  I)  be  any  matroid.  If  x  is  an  element  of  S  such  that  x  is  not  an 
extension  of  0,  then  x  is  not  an  extension  of  any  independent  subset  A  of  S'. 


Proof  This  corollary  is  simply  the  contrapositive  of  Lemma  16.8. 
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Corollary  16.9  says  that  any  element  that  cannot  be  used  immediately  can  never 
be  used.  Therefore,  Greedy  cannot  make  an  error  by  passing  over  any  initial 
elements  in  S  that  are  not  an  extension  of  0,  since  they  can  never  be  used. 

Lemma  16.10  (Matroids  exhibit  the  optimal-substructure  property ) 

Let  x  be  the  first  element  of  S  chosen  by  Greedy  for  the  weighted  matroid 
M  =  (S.  1).  The  remaining  problem  of  finding  a  maximum-weight  indepen¬ 
dent  subset  containing  x  reduces  to  finding  a  maximum-weight  independent  subset 
of  the  weighted  matroid  M'  =  (S'.  I'),  where 

5'  =  {yeS:{x,y}eI}  , 

r  =  {fia-{i}ju{x}6i}, 

and  the  weight  function  for  M'  is  the  weight  function  for  M,  restricted  to  S'.  (We 
call  M'  the  contraction  of  M  by  the  element  x.) 

Proof  If  A  is  any  maximum-weight  independent  subset  of  M  containing  x,  then 
A!  =  A  —  {x}  is  an  independent  subset  of  M'.  Conversely,  any  independent  sub¬ 
set  A!  of  M'  yields  an  independent  subset  A  =  A'  U  {x}  of  M .  Since  we  have  in 
both  cases  that  w(A)  =  w(A')  +  w(x),  a  maximum-weight  solution  in  M  contain¬ 
ing  x  yields  a  maximum-weight  solution  in  M',  and  vice  versa.  ■ 

Theorem  16.11  (Correctness  of  the  greedy  algorithm  on  matroids) 

If  M  —  ( S ,  J)  is  a  weighted  matroid  with  weight  function  w,  then  Greedy(M,  w) 
returns  an  optimal  subset. 

Proof  By  Corollary  16.9,  any  elements  that  Greedy  passes  over  initially  be¬ 
cause  they  are  not  extensions  of  0  can  be  forgotten  about,  since  they  can  never 
be  useful.  Once  Greedy  selects  the  first  element  x,  Lemma  16.7  implies  that 
the  algorithm  does  not  err  by  adding  x  to  A,  since  there  exists  an  optimal  subset 
containing  x.  Finally,  Lemma  16.10  implies  that  the  remaining  problem  is  one  of 
finding  an  optimal  subset  in  the  matroid  M'  that  is  the  contraction  of  M  by  x. 
After  the  procedure  Greedy  sets  A  to  {x},  we  can  interpret  all  of  its  remaining 
steps  as  acting  in  the  matroid  M'  =  (S',  I'),  because  B  is  independent  in  M'  if 
and  only  if  B  U  {x}  is  independent  in  M ,  for  all  sets  B  e  I'.  Thus,  the  subsequent 
operation  of  Greedy  will  find  a  maximum-weight  independent  subset  for  M' ,  and 
the  overall  operation  of  Greedy  will  find  a  maximum-weight  independent  subset 
for  M .  ■ 


16.5  A  task  scheduling  problem  as  a  matroid 
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Exercises 


16.4- 1 

Show  that  ( S ,  Ik  )  is  a  matroid,  where  S  is  any  finite  set  and  Ik  is  the  set  of  all 
subsets  of  S  of  size  at  most  k,  where  k  <  |Sj. 

16.4- 2  * 

Given  an  m  x  n  matrix  T  over  some  field  (such  as  the  reals),  show  that  (S,  I)  is  a 
matroid,  where  S  is  the  set  of  columns  of  T  and  A  e  I  if  and  only  if  the  columns 
in  A  are  linearly  independent. 

16.4- 3  * 

Show  that  if  ( S ,  I)  is  a  matroid,  then  ( S ,  I')  is  a  matroid,  where 
I'  =  {A'  :  S  —  A'  contains  some  maximal  A  €  1}  . 

That  is,  the  maximal  independent  sets  of  (S, 1')  are  just  the  complements  of  the 
maximal  independent  sets  of  (S.  I). 

16.4- 4  * 

Let  S  be  a  finite  set  and  let  Sj ,  S2, . . . ,  Sk  be  a  partition  of  S  into  nonempty  disjoint 
subsets.  Define  the  structure  (S.  I)  by  the  condition  that  I  =  {A  :  \A  D  5,j  <  1 
for  i  =  1,2,...,  k}.  Show  that  (5,  J)  is  a  matroid.  That  is,  the  set  of  all  sets  A 
that  contain  at  most  one  member  of  each  subset  in  the  partition  determines  the 
independent  sets  of  a  matroid. 


16.4-5 

Show  how  to  transform  the  weight  function  of  a  weighted  matroid  problem,  where 
the  desired  optimal  solution  is  a  minimum-weight  maximal  independent  subset,  to 
make  it  a  standard  weighted-matroid  problem.  Argue  carefully  that  your  transfor¬ 
mation  is  correct. 


★  16.5  A  task-scheduling  problem  as  a  matroid 

An  interesting  problem  that  we  can  solve  using  matroids  is  the  problem  of  op¬ 
timally  scheduling  unit-time  tasks  on  a  single  processor,  where  each  task  has  a 
deadline,  along  with  a  penalty  paid  if  the  task  misses  its  deadline.  The  problem 
looks  complicated,  but  we  can  solve  it  in  a  surprisingly  simple  manner  by  casting 
it  as  a  matroid  and  using  a  greedy  algorithm. 

A  unit-time  task  is  a  job,  such  as  a  program  to  be  run  on  a  computer,  that  requires 
exactly  one  unit  of  time  to  complete.  Given  a  finite  set  S  of  unit-time  tasks,  a 
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schedule  for  S  is  a  permutation  of  S  specifying  the  order  in  which  to  perform 
these  tasks.  The  first  task  in  the  schedule  begins  at  time  0  and  finishes  at  time  1 , 
the  second  task  begins  at  time  1  and  finishes  at  time  2,  and  so  on. 

The  problem  of  scheduling  unit-time  tasks  with  deadlines  and  penalties  for  a 
single  processor  has  the  following  inputs: 

•  a  set  S  =  {«i,  a2, . . . ,  an}  of  n  unit-time  tasks; 

•  a  set  of  n  integer  deadlines  di ,  d2, . . . ,  dn,  such  that  each  c/,  satisfies  1  <  c/,  <  n 
and  task  a,  is  supposed  to  finish  by  time  d, ;  and 

•  a  set  of  n  nonnegative  weights  or  penalties  W\ ,  v>2, ....  wn,  such  that  we  incur 
a  penalty  of  Wj  if  task  a,  is  not  finished  by  time  dj,  and  we  incur  no  penalty  if 
a  task  finishes  by  its  deadline. 

We  wish  to  find  a  schedule  for  S  that  minimizes  the  total  penalty  incurred  for 
missed  deadlines. 

Consider  a  given  schedule.  We  say  that  a  task  is  late  in  this  schedule  if  it  finishes 
after  its  deadline.  Otherwise,  the  task  is  early  in  the  schedule.  We  can  always  trans¬ 
form  an  arbitrary  schedule  into  early-first  form,  in  which  the  early  tasks  precede 
the  late  tasks.  To  see  why,  note  that  if  some  early  task  a,  follows  some  late  task  aj, 
then  we  can  switch  the  positions  of  a,  and  aj,  and  at  will  still  be  early  and  Uj  will 
still  be  late. 

Furthermore,  we  claim  that  we  can  always  transform  an  arbitrary  schedule  into 
canonical  form,  in  which  the  early  tasks  precede  the  late  tasks  and  we  schedule 
the  early  tasks  in  order  of  monotonically  increasing  deadlines.  To  do  so,  we  put 
the  schedule  into  early-first  form.  Then,  as  long  as  there  exist  two  early  tasks  a, 
and  aj  finishing  at  respective  times  k  and  k  +  1  in  the  schedule  such  that  dj  <  dt , 
we  swap  the  positions  of  a,-  and  aj.  Since  aj  is  early  before  the  swap,  k  +  1  <  dj. 
Therefore,  k  +  1  <  di,  and  so  a,  is  still  early  after  the  swap.  Because  task  aj  is 
moved  earlier  in  the  schedule,  it  remains  early  after  the  swap. 

The  search  for  an  optimal  schedule  thus  reduces  to  finding  a  set  A  of  tasks  that 
we  assign  to  be  early  in  the  optimal  schedule.  Having  determined  A,  we  can  create 
the  actual  schedule  by  listing  the  elements  of  A  in  order  of  monotonically  increas¬ 
ing  deadlines,  then  listing  the  late  tasks  (i.e.,  S  —  A)  in  any  order,  producing  a 
canonical  ordering  of  the  optimal  schedule. 

We  say  that  a  set  A  of  tasks  is  independent  if  there  exists  a  schedule  for  these 
tasks  such  that  no  tasks  are  late.  Clearly,  the  set  of  early  tasks  for  a  schedule  forms 
an  independent  set  of  tasks.  Let  J  denote  the  set  of  all  independent  sets  of  tasks. 

Consider  the  problem  of  determining  whether  a  given  set  A  of  tasks  is  indepen¬ 
dent.  For  t  =  0, 1, 2, . . . , n,  let  Nt(A)  denote  the  number  of  tasks  in  A  whose 
deadline  is  t  or  earlier.  Note  that  N0(A)  =  0  for  any  set  A. 


16.5  A  task  scheduling  problem  as  a  matroid 
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Lemma  16.12 

For  any  set  of  tasks  A,  the  following  statements  are  equivalent. 

1 .  The  set  A  is  independent. 

2.  For  t  =  0,  1,2, ...,«,  we  have  N,{A)  <  t. 

3.  If  the  tasks  in  A  are  scheduled  in  order  of  monotonically  increasing  deadlines, 
then  no  task  is  late. 


Proof  To  show  that  (1)  implies  (2),  we  prove  the  contrapositive:  if  Nt(A)  >  t  for 
some  t,  then  there  is  no  way  to  make  a  schedule  with  no  late  tasks  for  set  A,  because 
more  than  t  tasks  must  finish  before  time  t.  Therefore,  (1)  implies  (2).  If  (2)  holds, 
then  (3)  must  follow:  there  is  no  way  to  “get  stuck”  when  scheduling  the  tasks  in 
order  of  monotonically  increasing  deadlines,  since  (2)  implies  that  the  /  th  largest 
deadline  is  at  least  i.  Finally,  (3)  trivially  implies  (1).  ■ 

Using  property  2  of  Lemma  1 6. 1 2,  we  can  easily  compute  whether  or  not  a  given 
set  of  tasks  is  independent  (see  Exercise  16.5-2). 

The  problem  of  minimizing  the  sum  of  the  penalties  of  the  late  tasks  is  the  same 
as  the  problem  of  maximizing  the  sum  of  the  penalties  of  the  early  tasks.  The 
following  theorem  thus  ensures  that  we  can  use  the  greedy  algorithm  to  find  an 
independent  set  A  of  tasks  with  the  maximum  total  penalty. 

Theorem  16.13 

If  S  is  a  set  of  unit-time  tasks  with  deadlines,  and  I  is  the  set  of  all  independent 
sets  of  tasks,  then  the  corresponding  system  (.S’,  I)  is  a  matroid. 

Proof  Every  subset  of  an  independent  set  of  tasks  is  certainly  independent.  To 
prove  the  exchange  property,  suppose  that  B  and  A  are  independent  sets  of  tasks 
and  that  |5|  >  |^4|.  Let  k  be  the  largest  t  such  that  Nt(B)  <  Nt(A).  (Such  a  value 
of  t  exists,  since  N0(A)  =  N0(B)  =  0.)  Since  Nn(B)  =  |F>|  and  Nn(A)  =  \A\, 
but  \B\  >  \A\,  we  must  have  that  k  <  n  and  that  Nj(B)  >  Nj(A)  for  all  j  in 
the  range  k  +  1  <  j  <  n.  Therefore,  B  contains  more  tasks  with  deadline  k  +  I 
than  A  does.  Let  a,  be  a  task  in  B  —  A  with  deadline  k  +  1.  Let  A'  =  A  U  {af. 

We  now  show  that  A!  must  be  independent  by  using  property  2  of  Lemma  16.12. 
For  0  <  t  <  k,  we  have  Nt{A')  =  Nt ( A )  <  t,  since  A  is  independent.  For 
k  <  t  <  n,  we  have  N,(A')  <  N,(B)  <  t,  since  B  is  independent.  Therefore,  A! 
is  independent,  completing  our  proof  that  (S,  I)  is  a  matroid.  ■ 

By  Theorem  16.11,  we  can  use  a  greedy  algorithm  to  find  a  maximum-weight 
independent  set  of  tasks  A.  We  can  then  create  an  optimal  schedule  having  the 
tasks  in  A  as  its  early  tasks.  This  method  is  an  efficient  algorithm  for  scheduling 
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Figure  16.7  An  instance  of  the  problem  of  scheduling  unit  time  tasks  with  deadlines  and  penalties 
for  a  single  processor. 

unit-time  tasks  with  deadlines  and  penalties  for  a  single  processor.  The  running 
time  is  0(n2)  using  Greedy,  since  each  of  the  0(n )  independence  checks  made 
by  that  algorithm  takes  time  0(n)  (see  Exercise  16.5-2).  Problem  16-4  gives  a 
faster  implementation. 

Figure  16.7  demonstrates  an  example  of  the  problem  of  scheduling  unit-time 
tasks  with  deadlines  and  penalties  for  a  single  processor.  In  this  example,  the 
greedy  algorithm  selects,  in  order,  tasks  a.\,  a2,  a2,  and  a4,  then  rejects  a5  (because 
N4{{ai,a2,a2,a4,a5})  =  5)  and  a6  (because  N4({ai,a2,a2,a4,a6})  =  5),  and 
finally  accepts  a7.  The  final  optimal  schedule  is 

(«2, 04,01,03, <27, ci5,«6)  , 

which  has  a  total  penalty  incurred  of  ws  +  u;6  =  50. 

Exercises 


16.5-1 

Solve  the  instance  of  the  scheduling  problem  given  in  Figure  16.7,  but  with  each 
penalty  wt  replaced  by  80  —  Wj . 


16.5-2 

Show  how  to  use  property  2  of  Femma  16.12  to  determine  in  time  0(| ^4|)  whether 
or  not  a  given  set  A  of  tasks  is  independent. 


Problems 


16-1  Coin  changing 

Consider  the  problem  of  making  change  for  n  cents  using  the  fewest  number  of 
coins.  Assume  that  each  coin’s  value  is  an  integer. 


a.  Describe  a  greedy  algorithm  to  make  change  consisting  of  quarters,  dimes, 
nickels,  and  pennies.  Prove  that  your  algorithm  yields  an  optimal  solution. 


Problems  for  Chapter  16 
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b.  Suppose  that  the  available  coins  are  in  the  denominations  that  are  powers  of  c, 
i.e.,  the  denominations  are  c°,  c1, . . . ,  ck  for  some  integers  c  >  1  and  k  >  1. 
Show  that  the  greedy  algorithm  always  yields  an  optimal  solution. 

c.  Give  a  set  of  coin  denominations  for  which  the  greedy  algorithm  does  not  yield 
an  optimal  solution.  Your  set  should  include  a  penny  so  that  there  is  a  solution 
for  every  value  of  n. 

d.  Give  an  0(n  /r)-time  algorithm  that  makes  change  for  any  set  of  k  different  coin 
denominations,  assuming  that  one  of  the  coins  is  a  penny. 

16-2  Scheduling  to  minimize  average  completion  time 

Suppose  you  are  given  a  set  S  =  {a\,a2, _ an}  of  tasks,  where  task  a,  re¬ 

quires  pi  units  of  processing  time  to  complete,  once  it  has  started.  You  have  one 
computer  on  which  to  run  these  tasks,  and  the  computer  can  run  only  one  task  at  a 
time.  Let  c,  be  the  completion  time  of  task  a, ,  that  is,  the  time  at  which  task  a,  com¬ 
pletes  processing.  Your  goal  is  to  minimize  the  average  completion  time,  that  is, 
to  minimize  (1  / n)  YH=\  ci-  F°r  example,  suppose  there  are  two  tasks,  a ,  and  a2, 
with  pi  =  3  and  p2  =  5,  and  consider  the  schedule  in  which  a2  runs  first,  followed 
by  a\.  Then  c2  =  5,  C\  =  8,  and  the  average  completion  time  is  (5  +  8)/2  =  6.5. 
If  task  a i  runs  first,  however,  then  c,  =  3,  c2  =  8,  and  the  average  completion 
time  is  (3  +  8)/2  =  5.5. 

a.  Give  an  algorithm  that  schedules  the  tasks  so  as  to  minimize  the  average  com¬ 
pletion  time.  Each  task  must  run  non-preemptively,  that  is,  once  task  c/,  starts,  it 
must  run  continuously  for  p,-  units  of  time.  Prove  that  your  algorithm  minimizes 
the  average  completion  time,  and  state  the  running  time  of  your  algorithm. 

b.  Suppose  now  that  the  tasks  are  not  all  available  at  once.  That  is,  each  task 
cannot  staid  until  its  release  time  r,.  Suppose  also  that  we  allow  preemption,  so 
that  a  task  can  be  suspended  and  restarted  at  a  later  time.  For  example,  a  task  a, 
with  processing  time  p ,•  =  6  and  release  time  r,  =  1  might  start  running  at 
time  1  and  be  preempted  at  time  4.  It  might  then  resume  at  time  10  but  be 
preempted  at  time  11,  and  it  might  finally  resume  at  time  13  and  complete  at 
time  15.  Task  a,  has  run  for  a  total  of  6  time  units,  but  its  running  time  has  been 
divided  into  three  pieces.  In  this  scenario,  af s  completion  time  is  15.  Give 
an  algorithm  that  schedules  the  tasks  so  as  to  minimize  the  average  completion 
time  in  this  new  scenario.  Prove  that  your  algorithm  minimizes  the  average 
completion  time,  and  state  the  running  time  of  your  algorithm. 
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16-3  Acyclic  subgraphs 

a.  The  incidence  matrix  for  an  undirected  graph  G  =  (V.  E)  is  a  |  V\  x  \E\  ma¬ 
trix  M  such  that  Mve  =  1  if  edge  e  is  incident  on  vertex  v,  and  Mve  =  0  other¬ 
wise.  Argue  that  a  set  of  columns  of  M  is  linearly  independent  over  the  field 
of  integers  modulo  2  if  and  only  if  the  corresponding  set  of  edges  is  acyclic. 
Then,  use  the  result  of  Exercise  16.4-2  to  provide  an  alternate  proof  that  ( E ,  I) 
of  part  (a)  is  a  matroid. 

b.  Suppose  that  we  associate  a  nonnegative  weight  w(e)  with  each  edge  in  an 
undirected  graph  G  =  (V,  E).  Give  an  efficient  algorithm  to  find  an  acyclic 
subset  of  E  of  maximum  total  weight. 

c.  Let  G(V,  E)  be  an  arbitrary  directed  graph,  and  let  (E.  I)  be  defined  so  that 
A  €  I  if  and  only  if  A  does  not  contain  any  directed  cycles.  Give  an  example 
of  a  directed  graph  G  such  that  the  associated  system  (E,  I)  is  not  a  matroid. 
Specify  which  defining  condition  for  a  matroid  fails  to  hold. 

d.  The  incidence  matrix  for  a  directed  graph  G  —  (V,  E)  with  no  self-loops  is  a 
\V\  x  \E  \  matrix  M  such  that  Mve  =  —  1  if  edge  e  leaves  vertex  v,  Mve  =  1  if 
edge  e  enters  vertex  v,  and  Mve  =  0  otherwise.  Argue  that  if  a  set  of  columns 
of  M  is  linearly  independent,  then  the  corresponding  set  of  edges  does  not 
contain  a  directed  cycle. 

e.  Exercise  16.4-2  tells  us  that  the  set  of  linearly  independent  sets  of  columns  of 
any  matrix  M  forms  a  matroid.  Explain  carefully  why  the  results  of  parts  (d) 
and  (e)  are  not  contradictory.  How  can  there  fail  to  be  a  perfect  correspon¬ 
dence  between  the  notion  of  a  set  of  edges  being  acyclic  and  the  notion  of  the 
associated  set  of  columns  of  the  incidence  matrix  being  linearly  independent? 

16-4  Scheduling  variations 

Consider  the  following  algorithm  for  the  problem  from  Section  16.5  of  scheduling 
unit-time  tasks  with  deadlines  and  penalties.  Let  all  n  time  slots  be  initially  empty, 
where  time  slot  i  is  the  unit-length  slot  of  time  that  finishes  at  time  i .  We  consider 
the  tasks  in  order  of  monotonically  decreasing  penalty.  When  considering  task  aj , 
if  there  exists  a  time  slot  at  or  before  aj ’s  deadline  dj  that  is  still  empty,  assign  aj 
to  the  latest  such  slot,  filling  it.  If  there  is  no  such  slot,  assign  task  a,  to  the  latest 
of  the  as  yet  unfilled  slots. 

a.  Argue  that  this  algorithm  always  gives  an  optimal  answer. 

b.  Use  the  fast  disjoint-set  forest  presented  in  Section  21.3  to  implement  the  algo¬ 
rithm  efficiently.  Assume  that  the  set  of  input  tasks  has  already  been  sorted  into 
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monotonically  decreasing  order  by  penalty.  Analyze  the  running  time  of  your 
implementation. 

16-5  Off-line  caching 

Modern  computers  use  a  cache  to  store  a  small  amount  of  data  in  a  fast  memory. 
Even  though  a  program  may  access  large  amounts  of  data,  by  storing  a  small  subset 
of  the  main  memory  in  the  cache— a  small  but  faster  memory— overall  access  time 
can  greatly  decrease.  When  a  computer  program  executes,  it  makes  a  sequence 
(rlt  r2,  . . . ,  rn)  of  n  memory  requests,  where  each  request  is  for  a  particular-  data 
element.  For  example,  a  program  that  accesses  4  distinct  elements  {a.  h.  c,  d } 
might  make  the  sequence  of  requests  (d,  b,  d ,  b ,  d,  a,  c,  d ,  b,a,c,b).  Let  k  be  the 
size  of  the  cache.  When  the  cache  contains  k  elements  and  the  program  requests  the 
(k  +  l)st  element,  the  system  must  decide,  for  this  and  each  subsequent  request, 
which  k  elements  to  keep  in  the  cache.  More  precisely,  for  each  request  rt,  the 
cache-management  algorithm  checks  whether  element  r,  is  already  in  the  cache.  If 
it  is,  then  we  have  a  cache  hit;  otherwise,  we  have  a  cache  miss.  Upon  a  cache 
miss,  the  system  retrieves  r,  from  the  main  memory,  and  the  cache-management 
algorithm  must  decide  whether  to  keep  r,  in  the  cache.  If  it  decides  to  keep  r,  and 
the  cache  already  holds  k  elements,  then  it  must  evict  one  element  to  make  room 
for  i'i .  The  cache-management  algorithm  evicts  data  with  the  goal  of  minimizing 
the  number  of  cache  misses  over  the  entire  sequence  of  requests. 

Typically,  caching  is  an  on-line  problem.  That  is,  we  have  to  make  decisions 
about  which  data  to  keep  in  the  cache  without  knowing  the  future  requests.  Here, 
however,  we  consider  the  off-line  version  of  this  problem,  in  which  we  are  given 
in  advance  the  entire  sequence  of  n  requests  and  the  cache  size  k,  and  we  wish  to 
minimize  the  total  number  of  cache  misses. 

We  can  solve  this  off-line  problem  by  a  greedy  strategy  called  furthest-in-future, 
which  chooses  to  evict  the  item  in  the  cache  whose  next  access  in  the  request 
sequence  comes  furthest  in  the  future. 

a.  Write  pseudocode  for  a  cache  manager  that  uses  the  furthest-in-future  strategy. 

The  input  should  be  a  sequence  {r  t ,  r2 . r„)  of  requests  and  a  cache  size  k, 

and  the  output  should  be  a  sequence  of  decisions  about  which  data  element  (if 
any)  to  evict  upon  each  request.  What  is  the  running  time  of  your  algorithm? 

b.  Show  that  the  off-line  caching  problem  exhibits  optimal  substructure. 

c.  Prove  that  furthest-in-future  produces  the  minimum  possible  number  of  cache 
misses. 
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Chapter  notes 

Much  more  material  on  greedy  algorithms  and  matroids  can  be  found  in  Lawler 
[224]  and  Papadimitriou  and  Steiglitz  [271]. 

The  greedy  algorithm  first  appeared  in  the  combinatorial  optimization  literature 
in  a  1971  article  by  Edmonds  [101],  though  the  theory  of  matroids  dates  back  to 
a  1935  article  by  Whitney  [355]. 

Our  proof  of  the  correctness  of  the  greedy  algorithm  for  the  activity-selection 
problem  is  based  on  that  of  Gavril  [131].  The  task-scheduling  problem  is  studied 
in  Lawler  [224];  Horowitz,  Sahni,  and  Rajasekaran  [181];  and  Brassard  and  Bratley 
[54], 

Huffman  codes  were  invented  in  1952  [185];  Lelewer  and  Hirschberg  [231]  sur¬ 
veys  data-compression  techniques  known  as  of  1987. 

An  extension  of  matroid  theory  to  greedoid  theory  was  pioneered  by  Korte  and 
Lovasz  [216,  217,  218,  219],  who  greatly  generalize  the  theory  presented  here. 
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Amortized  Analysis 


In  an  amortized  analysis,  we  average  the  time  required  to  perform  a  sequence  of 
data-structure  operations  over  all  the  operations  performed.  With  amortized  analy¬ 
sis,  we  can  show  that  the  average  cost  of  an  operation  is  small,  if  we  average  over  a 
sequence  of  operations,  even  though  a  single  operation  within  the  sequence  might 
be  expensive.  Amortized  analysis  differs  from  average-case  analysis  in  that  prob¬ 
ability  is  not  involved;  an  amortized  analysis  guarantees  the  average  performance 
of  each  operation  in  the  worst  case. 

The  first  three  sections  of  this  chapter  cover  the  three  most  common  techniques 
used  in  amortized  analysis.  Section  17.1  starts  with  aggregate  analysis,  in  which 
we  determine  an  upper  bound  T  (n)  on  the  total  cost  of  a  sequence  of  n  operations. 
The  average  cost  per  operation  is  then  T(n)/n.  We  take  the  average  cost  as  the 
amortized  cost  of  each  operation,  so  that  all  operations  have  the  same  amortized 
cost. 

Section  17.2  covers  the  accounting  method,  in  which  we  determine  an  amortized 
cost  of  each  operation.  When  there  is  more  than  one  type  of  operation,  each  type  of 
operation  may  have  a  different  amortized  cost.  The  accounting  method  overcharges 
some  operations  early  in  the  sequence,  storing  the  overcharge  as  “prepaid  credit” 
on  specific  objects  in  the  data  structure.  Later  in  the  sequence,  the  credit  pays  for 
operations  that  are  charged  less  than  they  actually  cost. 

Section  17.3  discusses  the  potential  method,  which  is  like  the  accounting  method 
in  that  we  determine  the  amortized  cost  of  each  operation  and  may  overcharge  op¬ 
erations  early  on  to  compensate  for  undercharges  later.  The  potential  method  main¬ 
tains  the  credit  as  the  “potential  energy”  of  the  data  structure  as  a  whole  instead  of 
associating  the  credit  with  individual  objects  within  the  data  structure. 

We  shall  use  two  examples  to  examine  these  three  methods.  One  is  a  stack 
with  the  additional  operation  Multipop,  which  pops  several  objects  at  once.  The 
other  is  a  binary  counter  that  counts  up  from  0  by  means  of  the  single  operation 
Increment. 
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While  reading  this  chapter,  bear  in  mind  that  the  charges  assigned  during  an 
amortized  analysis  are  for  analysis  purposes  only.  They  need  not— and  should 
not— appeal-  in  the  code.  If,  for  example,  we  assign  a  credit  to  an  object  x  when 
using  the  accounting  method,  we  have  no  need  to  assign  an  appropriate  amount  to 
some  attribute,  such  as  x. credit,  in  the  code. 

When  we  perform  an  amortized  analysis,  we  often  gain  insight  into  a  particular 
data  structure,  and  this  insight  can  help  us  optimize  the  design.  In  Section  17.4, 
for  example,  we  shall  use  the  potential  method  to  analyze  a  dynamically  expanding 
and  contracting  table. 


17.1  Aggregate  analysis 

In  aggregate  analysis,  we  show  that  for  all  n,  a  sequence  of  n  operations  takes 
worst-case  time  T(n)  in  total.  In  the  worst  case,  the  average  cost,  or  amortized 
cost,  per  operation  is  therefore  T(n)/n.  Note  that  this  amortized  cost  applies  to 
each  operation,  even  when  there  are  several  types  of  operations  in  the  sequence. 
The  other  two  methods  we  shall  study  in  this  chapter,  the  accounting  method  and 
the  potential  method,  may  assign  different  amortized  costs  to  different  types  of 
operations. 

Stack  operations 

In  our  first  example  of  aggregate  analysis,  we  analyze  stacks  that  have  been  aug¬ 
mented  with  a  new  operation.  Section  10.1  presented  the  two  fundamental  stack 
operations,  each  of  which  takes  (9(1)  time: 

PUSH(S,  x)  pushes  object  x  onto  stack  S. 

Pop(S')  pops  the  top  of  stack  S  and  returns  the  popped  object.  Calling  POP  on  an 
empty  stack  generates  an  error. 

Since  each  of  these  operations  runs  in  (9(1)  time,  let  us  consider  the  cost  of  each 
to  be  1.  The  total  cost  of  a  sequence  of  n  PUSH  and  POP  operations  is  therefore  n, 
and  the  actual  running  time  for  n  operations  is  therefore  0(»). 

Now  we  add  the  stack  operation  Multipop  (S',  k),  which  removes  the  k  top  ob¬ 
jects  of  stack  S,  popping  the  entire  stack  if  the  stack  contains  fewer  than  k  objects. 
Of  course,  we  assume  that  k  is  positive;  otherwise  the  Multipop  operation  leaves 
the  stack  unchanged.  In  the  following  pseudocode,  the  operation  Stack-Empty 
returns  TRUE  if  there  are  no  objects  currently  on  the  stack,  and  FALSE  otherwise. 
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Figure  17.1  The  action  of  MULTIPOP  on  a  stack  S,  shown  initially  in  (a).  The  top  4  objects  are 
popped  by  MULTIPOP(5,  4),  whose  result  is  shown  in  (b).  The  next  operation  is  MULTIPOP(S\  7), 
which  empties  the  stack  shown  in  (c)  since  there  were  fewer  than  7  objects  remaining. 


Multipop  (S,k) 

1  while  not  Stack-Empty  ( S )  and  k  >  0 

2  Pop(S) 

3  k  =  k  —  1 

Figure  17.1  shows  an  example  of  Multipop. 

What  is  the  running  time  of  Multipop  (S,  k)  on  a  stack  of  s  objects?  The 
actual  running  time  is  linear  in  the  number  of  POP  operations  actually  executed, 
and  thus  we  can  analyze  Multipop  in  terms  of  the  abstract  costs  of  1  each  for 
PUSH  and  Pop.  The  number  of  iterations  of  the  while  loop  is  the  number  minjs,  k  ) 
of  objects  popped  off  the  stack.  Each  iteration  of  the  loop  makes  one  call  to  POP  in 
line  2.  Thus,  the  total  cost  of  Multipop  is  min(5,  k),  and  the  actual  running  time 
is  a  lineal-  function  of  this  cost. 

Let  us  analyze  a  sequence  of  n  PUSH,  POP,  and  Multipop  operations  on  an  ini¬ 
tially  empty  stack.  The  worst-case  cost  of  a  Multipop  operation  in  the  sequence 
is  0{n),  since  the  stack  size  is  at  most  n.  The  worst-case  time  of  any  stack  opera¬ 
tion  is  therefore  0(n),  and  hence  a  sequence  of  n  operations  costs  0(n2),  since  we 
may  have  0(n)  Multipop  operations  costing  0(n )  each.  Although  this  analysis 
is  correct,  the  0(n2)  result,  which  we  obtained  by  considering  the  worst-case  cost 
of  each  operation  individually,  is  not  tight. 

Using  aggregate  analysis,  we  can  obtain  a  better  upper  bound  that  considers  the 
entire  sequence  of  n  operations.  In  fact,  although  a  single  Multipop  operation 
can  be  expensive,  any  sequence  of  n  Push,  Pop,  and  Multipop  operations  on  an 
initially  empty  stack  can  cost  at  most  0(n).  Why?  We  can  pop  each  object  from  the 
stack  at  most  once  for  each  time  we  have  pushed  it  onto  the  stack.  Therefore,  the 
number  of  times  that  POP  can  be  called  on  a  nonempty  stack,  including  calls  within 
Multipop,  is  at  most  the  number  of  Push  operations,  which  is  at  most  n.  For  any 
value  of  n,  any  sequence  of  n  Push,  Pop,  and  Multipop  operations  takes  a  total 
of  0(n)  time.  The  average  cost  of  an  operation  is  0(n)/n  =  0(  1).  In  aggregate 
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analysis,  we  assign  the  amortized  cost  of  each  operation  to  be  the  average  cost.  In 
this  example,  therefore,  all  three  stack  operations  have  an  amortized  cost  of  0(1). 

We  emphasize  again  that  although  we  have  just  shown  that  the  average  cost,  and 
hence  the  running  time,  of  a  stack  operation  is  0(1),  we  did  not  use  probabilistic 
reasoning.  We  actually  showed  a  worst-case  bound  of  0(n)  on  a  sequence  of  n 
operations.  Dividing  this  total  cost  by  n  yielded  the  average  cost  per  operation,  or 
the  amortized  cost. 

Incrementing  a  binary  counter 

As  another  example  of  aggregate  analysis,  consider  the  problem  of  implementing 
a  A  -bit  binary  counter  that  counts  upward  from  0.  We  use  an  array  A[0 . .  k  —  1]  of 
bits,  where  A. length  —  k,  as  the  counter.  A  binary  number  x  that  is  stored  in  the 
counter  has  its  lowest-order  bit  in  A[0]  and  its  highest-order  bit  in  A[k  —  1],  so  that 

x  =  Ya= o  A[i]-2‘ .  Initially,  x  =  0,  and  thus  A[i\  =  0  for  i  =  0,  1 . k  —  1.  To 

add  1  (modulo  2k)  to  the  value  in  the  counter,  we  use  the  following  procedure. 

Increment  (A) 

1  i=0 

2  while  i  <  A. length  and  A[i]  ==  1 

3  A[i]  =  0 

4  i  =  i  +  1 

5  if  i  <  A. length 

6  A[i]  =  I 

Figure  17.2  shows  what  happens  to  a  binary  counter  as  we  increment  it  16  times, 
starting  with  the  initial  value  0  and  ending  with  the  value  16.  At  the  start  of 
each  iteration  of  the  while  loop  in  lines  2-4,  we  wish  to  add  a  1  into  position  i. 
If  A[i]  =  1,  then  adding  1  flips  the  bit  to  0  in  position  i  and  yields  a  carry  of  1, 
to  be  added  into  position  i  +  1  on  the  next  iteration  of  the  loop.  Otherwise,  the 
loop  ends,  and  then,  if  i  <  k,  we  know  that  A[i]  =  0,  so  that  line  6  adds  a  1  into 
position  i,  flipping  the  0  to  a  1.  The  cost  of  each  INCREMENT  operation  is  linear 
in  the  number  of  bits  flipped. 

As  with  the  stack  example,  a  cursory  analysis  yields  a  bound  that  is  correct  but 
not  tight.  A  single  execution  of  INCREMENT  takes  time  B(/c)  in  the  worst  case,  in 
which  array  A  contains  all  Is.  Thus,  a  sequence  of  n  INCREMENT  operations  on 
an  initially  zero  counter  takes  time  0(nk)  in  the  worst  case. 

We  can  tighten  our  analysis  to  yield  a  worst-case  cost  of  0(n)  for  a  sequence  of  n 
Increment  operations  by  observing  that  not  all  bits  flip  each  time  Increment 
is  called.  As  Figure  17.2  shows,  A[0]  does  flip  each  time  INCREMENT  is  called. 
The  next  bit  up,  A[l],  flips  only  every  other  time:  a  sequence  of  n  INCREMENT 
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Figure  17.2  An  8  bit  binary  counter  as  its  value  goes  from  0  to  16  by  a  sequence  of  16  INCREMENT 
operations.  Bits  that  flip  to  achieve  the  next  value  are  shaded.  The  running  cost  for  flipping  bits  is 
shown  at  the  right.  Notice  that  the  total  cost  is  always  less  than  twice  the  total  number  of  INCREMENT 
operations. 


operations  on  an  initially  zero  counter  causes  A[l]  to  flip  |_«/2J  times.  Similarly, 
bit  A  [2]  flips  only  every  fourth  time,  or  [_/? / 4J  times  in  a  sequence  of  n  INCREMENT 

operations.  In  general,  for  i  =  0.  1 . k  —  1,  bit  A[i\  flips  [/; / 2'  J  times  in  a 

sequence  of  n  INCREMENT  operations  on  an  initially  zero  counter.  For  i  >  k, 
bit  A[i]  does  not  exist,  and  so  it  cannot  flip.  The  total  number  of  flips  in  the 
sequence  is  thus 
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by  equation  (A.6).  The  worst-case  time  for  a  sequence  of  n  INCREMENT  operations 
on  an  initially  zero  counter  is  therefore  0(n).  The  average  cost  of  each  operation, 
and  therefore  the  amortized  cost  per  operation,  is  0(n)/n  =  0(1). 
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Exercises 


17.1-1 


If  the  set  of  stack  operations  included  a  Multipush  operation,  which  pushes  k 
items  onto  the  stack,  would  the  0(1)  bound  on  the  amortized  cost  of  stack  opera¬ 
tions  continue  to  hold? 

17.1-2 

Show  that  if  a  DECREMENT  operation  were  included  in  the  k-bit  counter  example, 
n  operations  could  cost  as  much  as  &(nk)  time. 


17.1-3 


Suppose  we  perform  a  sequence  of  n  operations  on  a  data  structure  in  which  the  i  th 
operation  costs  i  if  i  is  an  exact  power  of  2,  and  1  otherwise.  Use  aggregate  analysis 
to  determine  the  amortized  cost  per  operation. 


17.2  The  accounting  method 


In  the  accounting  method  of  amortized  analysis,  we  assign  differing  charges  to 
different  operations,  with  some  operations  charged  more  or  less  than  they  actu¬ 
ally  cost.  We  call  the  amount  we  charge  an  operation  its  amortized  cost.  When 
an  operation’s  amortized  cost  exceeds  its  actual  cost,  we  assign  the  difference  to 
specific  objects  in  the  data  structure  as  credit.  Credit  can  help  pay  for  later  oper¬ 
ations  whose  amortized  cost  is  less  than  their  actual  cost.  Thus,  we  can  view  the 
amortized  cost  of  an  operation  as  being  split  between  its  actual  cost  and  credit  that 
is  either  deposited  or  used  up.  Different  operations  may  have  different  amortized 
costs.  This  method  differs  from  aggregate  analysis,  in  which  all  operations  have 
the  same  amortized  cost. 

We  must  choose  the  amortized  costs  of  operations  carefully.  If  we  want  to  show 
that  in  the  worst  case  the  average  cost  per  operation  is  small  by  analyzing  with 
amortized  costs,  we  must  ensure  that  the  total  amortized  cost  of  a  sequence  of  oper¬ 
ations  provides  an  upper  bound  on  the  total  actual  cost  of  the  sequence.  Moreover, 
as  in  aggregate  analysis,  this  relationship  must  hold  for  all  sequences  of  opera¬ 
tions.  If  we  denote  the  actual  cost  of  the  /  th  operation  by  c,  and  the  amortized  cost 
of  the  /  th  operation  by  c) ,  we  require 


n 


n 


(17.1) 


for  all  sequences  of  n  operations.  The  total  credit  stored  in  the  data  structure 
is  the  difference  between  the  total  amortized  cost  and  the  total  actual  cost,  or 
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Y^i=i  C;  —  Yl'i=\  ci  ■  By  inequality  (17.1),  the  total  credit  associated  with  the  data 
structure  must  be  nonnegative  at  all  times.  If  we  ever  were  to  allow  the  total  credit 
to  become  negative  (the  result  of  undercharging  early  operations  with  the  promise 
of  repaying  the  account  later  on),  then  the  total  amortized  costs  incurred  at  that 
time  would  be  below  the  total  actual  costs  incurred;  for  the  sequence  of  operations 
up  to  that  time,  the  total  amortized  cost  would  not  be  an  upper  bound  on  the  total 
actual  cost.  Thus,  we  must  take  care  that  the  total  credit  in  the  data  structure  never 
becomes  negative. 

Stack  operations 

To  illustrate  the  accounting  method  of  amortized  analysis,  let  us  return  to  the  stack 
example.  Recall  that  the  actual  costs  of  the  operations  were 

Push  1  , 

Pop  1 , 

Multipop  min(k,  s)  , 

where  k  is  the  argument  supplied  to  MULTIPOP  and  s  is  the  stack  size  when  it  is 
called.  Let  us  assign  the  following  amortized  costs: 

Push  2 , 

Pop  0 , 

Multipop  0 . 

Note  that  the  amortized  cost  of  Multipop  is  a  constant  (0),  whereas  the  actual  cost 
is  variable.  Here,  all  three  amortized  costs  are  constant.  In  general,  the  amortized 
costs  of  the  operations  under  consideration  may  differ  from  each  other,  and  they 
may  even  differ  asymptotically. 

We  shall  now  show  that  we  can  pay  for  any  sequence  of  stack  operations  by 
charging  the  amortized  costs.  Suppose  we  use  a  dollar  bill  to  represent  each  unit 
of  cost.  We  start  with  an  empty  stack.  Recall  the  analogy  of  Section  10.1  between 
the  stack  data  structure  and  a  stack  of  plates  in  a  cafeteria.  When  we  push  a  plate 
on  the  stack,  we  use  1  dollar  to  pay  the  actual  cost  of  the  push  and  are  left  with  a 
credit  of  1  dollar  (out  of  the  2  dollars  charged),  which  we  leave  on  top  of  the  plate. 
At  any  point  in  time,  every  plate  on  the  stack  has  a  dollar  of  credit  on  it. 

The  dollar  stored  on  the  plate  serves  as  prepayment  for  the  cost  of  popping  it 
from  the  stack.  When  we  execute  a  POP  operation,  we  charge  the  operation  nothing 
and  pay  its  actual  cost  using  the  credit  stored  in  the  stack.  To  pop  a  plate,  we  take 
the  dollar  of  credit  off  the  plate  and  use  it  to  pay  the  actual  cost  of  the  operation. 
Thus,  by  charging  the  PUSH  operation  a  little  bit  more,  we  can  charge  the  POP 
operation  nothing. 
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Moreover,  we  can  also  charge  Multipop  operations  nothing.  To  pop  the  first 
plate,  we  take  the  dollar  of  credit  off  the  plate  and  use  it  to  pay  the  actual  cost  of  a 
POP  operation.  To  pop  a  second  plate,  we  again  have  a  dollar  of  credit  on  the  plate 
to  pay  for  the  POP  operation,  and  so  on.  Thus,  we  have  always  charged  enough 
up  front  to  pay  for  Multipop  operations.  In  other  words,  since  each  plate  on  the 
stack  has  1  dollar  of  credit  on  it,  and  the  stack  always  has  a  nonnegative  number  of 
plates,  we  have  ensured  that  the  amount  of  credit  is  always  nonnegative.  Thus,  for 
any  sequence  of  n  Push,  Pop,  and  Multipop  operations,  the  total  amortized  cost 
is  an  upper  bound  on  the  total  actual  cost.  Since  the  total  amortized  cost  is  O(n), 
so  is  the  total  actual  cost. 

Incrementing  a  binary  counter 

As  another  illustration  of  the  accounting  method,  we  analyze  the  INCREMENT  op¬ 
eration  on  a  binary  counter  that  starts  at  zero.  As  we  observed  earlier,  the  running 
time  of  this  operation  is  proportional  to  the  number  of  bits  flipped,  which  we  shall 
use  as  our  cost  for  this  example.  Let  us  once  again  use  a  dollar  bill  to  represent 
each  unit  of  cost  (the  flipping  of  a  bit  in  this  example). 

For  the  amortized  analysis,  let  us  charge  an  amortized  cost  of  2  dollars  to  set  a 
bit  to  1 .  When  a  bit  is  set,  we  use  1  dollar  (out  of  the  2  dollars  charged)  to  pay 
for  the  actual  setting  of  the  bit,  and  we  place  the  other  dollar  on  the  bit  as  credit  to 
be  used  later  when  we  flip  the  bit  back  to  0.  At  any  point  in  time,  every  1  in  the 
counter  has  a  dollar  of  credit  on  it,  and  thus  we  can  charge  nothing  to  reset  a  bit 
to  0;  we  just  pay  for  the  reset  with  the  dollar  bill  on  the  bit. 

Now  we  can  determine  the  amortized  cost  of  INCREMENT.  The  cost  of  resetting 
the  bits  within  the  while  loop  is  paid  for  by  the  dollars  on  the  bits  that  are  reset.  The 
Increment  procedure  sets  at  most  one  bit,  in  line  6,  and  therefore  the  amortized 
cost  of  an  INCREMENT  operation  is  at  most  2  dollars.  The  number  of  Is  in  the 
counter  never  becomes  negative,  and  thus  the  amount  of  credit  stays  nonnegative 
at  all  times.  Thus,  for  n  INCREMENT  operations,  the  total  amortized  cost  is  O(n), 
which  bounds  the  total  actual  cost. 

Exercises 


17.2-1 

Suppose  we  perform  a  sequence  of  stack  operations  on  a  stack  whose  size  never 
exceeds  k.  After  every  k  operations,  we  make  a  copy  of  the  entire  stack  for  backup 
puiposes.  Show  that  the  cost  of  n  stack  operations,  including  copying  the  stack, 
is  0(n)  by  assigning  suitable  amortized  costs  to  the  various  stack  operations. 
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17.2-2 

Redo  Exercise  17.1-3  using  an  accounting  method  of  analysis. 


17.2-3 

Suppose  we  wish  not  only  to  increment  a  counter  but  also  to  reset  it  to  zero  (i.e., 
make  all  bits  in  it  0).  Counting  the  time  to  examine  or  modify  a  bit  as  0(1), 
show  how  to  implement  a  counter  as  an  array  of  bits  so  that  any  sequence  of  n 
Increment  and  Reset  operations  takes  time  O(n)  on  an  initially  zero  counter. 
{Hint:  Keep  a  pointer  to  the  high-order  1 .) 


17.3  The  potential  method 

Instead  of  representing  prepaid  work  as  credit  stored  with  specific  objects  in  the 
data  structure,  the  potential  method  of  amortized  analysis  represents  the  prepaid 
work  as  “potential  energy,”  or  just  “potential,”  which  can  be  released  to  pay  for 
future  operations.  We  associate  the  potential  with  the  data  structure  as  a  whole 
rather  than  with  specific  objects  within  the  data  structure. 

The  potential  method  works  as  follows.  We  will  perform  n  operations,  starting 
with  an  initial  data  structure  D0 ■  For  each  i  =  1, 2, . . . ,  n,  we  let  c,  be  the  actual 
cost  of  the  zth  operation  and  Z),  be  the  data  structure  that  results  after  applying 
the  zth  operation  to  data  structure  A-i-  A  potential  function  <t>  maps  each  data 
structure  D,  to  a  real  number  <t>(  D,  ),  which  is  the  potential  associated  with  data 
structure  D,.  The  amortized  cost  c)  of  the  zth  operation  with  respect  to  potential 
function  d>  is  defined  by 

Ci  =C/  +  4>(A)-3>(A-i).  (17-2) 

The  amortized  cost  of  each  operation  is  therefore  its  actual  cost  plus  the  change  in 
potential  due  to  the  operation.  By  equation  (17.2),  the  total  amortized  cost  of  the  n 
operations  is 

n  n 

=  £>,•  +  O(A)  -  S(A--i)) 

1=1  1=1 

n 

=  ^> +  <£(£>„)- <D(Z)0)  .  (17.3) 

z=i 

The  second  equality  follows  from  equation  (A.9)  because  the  $(7),  )  terms  tele¬ 
scope. 

If  we  can  define  a  potential  function  d>  so  that  <£(£>„)  >  <f>(  D0),  then  the  total 
amortized  cost  ^j"=l  c,  gives  an  upper  bound  on  the  total  actual  cost  YTi=\ci- 
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In  practice,  we  do  not  always  know  how  many  operations  might  be  performed. 
Therefore,  if  we  require  that  <f>( D,)  >  <f>( D„)  for  all  /,  then  we  guarantee,  as  in 
the  accounting  method,  that  we  pay  in  advance.  We  usually  just  define  ^>(D0)  to 
be  0  and  then  show  that  <T>( Z), )  >  0  for  all  i.  (See  Exercise  17.3-1  for  an  easy  way 
to  handle  cases  in  which  d>(D0)  ^  0.) 

Intuitively,  if  the  potential  difference  d>(D,  )  —  <t>( D,-\  )  of  the  /th  operation  is 
positive,  then  the  amortized  cost  c)  represents  an  overcharge  to  the  /th  operation, 
and  the  potential  of  the  data  structure  increases.  If  the  potential  difference  is  neg¬ 
ative,  then  the  amortized  cost  represents  an  undercharge  to  the  /  th  operation,  and 
the  decrease  in  the  potential  pays  for  the  actual  cost  of  the  operation. 

The  amortized  costs  defined  by  equations  (17.2)  and  (17.3)  depend  on  the  choice 
of  the  potential  function  d>.  Different  potential  functions  may  yield  different  amor¬ 
tized  costs  yet  still  be  upper  bounds  on  the  actual  costs.  We  often  find  trade-offs 
that  we  can  make  in  choosing  a  potential  function;  the  best  potential  function  to 
use  depends  on  the  desired  time  bounds. 

Stack  operations 

To  illustrate  the  potential  method,  we  return  once  again  to  the  example  of  the  stack 
operations  Push,  Pop,  and  Multipop.  We  define  the  potential  function  $  on  a 
stack  to  be  the  number  of  objects  in  the  stack.  For  the  empty  stack  D0  with  which 
we  start,  we  have  d>(D0)  =  0.  Since  the  number  of  objects  in  the  stack  is  never 
negative,  the  stack  Z),  that  results  after  the  /  th  operation  has  nonnegative  potential, 
and  thus 

<D(A)  >  0 

=  <&(A>). 

The  total  amortized  cost  of  n  operations  with  respect  to  <f>  therefore  represents  an 
upper  bound  on  the  actual  cost. 

Let  us  now  compute  the  amortized  costs  of  the  various  stack  operations.  If  the  /  th 
operation  on  a  stack  containing  s  objects  is  a  PUSH  operation,  then  the  potential 
difference  is 

<b(A)  -  d>(A-i)  =  0  +  1)  -s 
=  l  . 

By  equation  (17.2),  the  amortized  cost  of  this  PUSH  operation  is 

c,  =  Cj  +  <b(A)  -  $(A-i) 

=  1  +  1 
=  2  . 
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Suppose  that  the  zth  operation  on  the  stack  is  Multipop (S,  k ),  which  causes 
k'  =  m i n ( A: ,  s)  objects  to  be  popped  off  the  stack.  The  actual  cost  of  the  opera¬ 
tion  is  k',  and  the  potential  difference  is 

*£(£,■) -4>(A-i)  =  -k'  . 

Thus,  the  amortized  cost  of  the  MULTIPOP  operation  is 

Cj  =  Ct  +  O(Z),-)  —  O(A-i) 

=  k'-k' 

=  0. 

Similarly,  the  amortized  cost  of  an  ordinary  POP  operation  is  0. 

The  amortized  cost  of  each  of  the  three  operations  is  0(1),  and  thus  the  total 
amortized  cost  of  a  sequence  of  n  operations  is  0(n).  Since  we  have  already  argued 
that  0(0,  )  >  <f>(  D0),  the  total  amortized  cost  of  n  operations  is  an  upper  bound 
on  the  total  actual  cost.  The  worst-case  cost  of  n  operations  is  therefore  0(n). 

Incrementing  a  binary  counter 

As  another  example  of  the  potential  method,  we  again  look  at  incrementing  a  binary 
counter.  This  time,  we  define  the  potential  of  the  counter  after  the  zth  INCREMENT 
operation  to  be  bj,  the  number  of  Is  in  the  counter  after  the  zth  operation. 

Let  us  compute  the  amortized  cost  of  an  INCREMENT  operation.  Suppose  that 
the  zth  Increment  operation  resets  i,  bits.  The  actual  cost  of  the  operation  is 
therefore  at  most  t,  +  1,  since  in  addition  to  resetting  l,  bits,  it  sets  at  most  one 
bit  to  1.  If  bj  =  0,  then  the  zth  operation  resets  all  k  bits,  and  so  bj- 1  =  /,  =  k. 
If  bj  >  0,  then  bj  =  b,-\  —  Z,  +  I .  In  either  case,  bj  <  b,-t  —  t,  +  1,  and  the 
potential  difference  is 

<T(D/)-d>(D;_1)  <  (b^-t,  +  1)  -  D/_, 

=  1  -  U  . 

The  amortized  cost  is  therefore 
Ci  —  C{  +  <f,(D,)  —  d>(D,_i) 

<  (ti  +  1)  +  (1  —  tj) 

=  2  . 

If  the  counter  starts  at  zero,  then  d>(D0)  =  0.  Since  T(  D,  )  >  0  for  all  z,  the  total 
amortized  cost  of  a  sequence  of  n  INCREMENT  operations  is  an  upper  bound  on  the 
total  actual  cost,  and  so  the  worst-case  cost  of  n  INCREMENT  operations  is  O(n). 

The  potential  method  gives  us  an  easy  way  to  analyze  the  counter  even  when 
it  does  not  start  at  zero.  The  counter  starts  with  b0  Is,  and  after  n  INCREMENT 
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operations  it  has  bn  Is,  where  0  <  b0,bn  <  k.  (Recall  that  k  is  the  number  of  bits 
in  the  counter.)  We  can  rewrite  equation  (17.3)  as 

n  n 

Y  =  Yd'~  *(A.)  +  ^(A,)  ■  (17.4) 

7=1  7=1 

We  have  c,  <  2  for  all  1  <  i  <  n.  Since  $(7)0)  =  b0  and  <f>(D„)  =  bn,  the  total 
actual  cost  of  n  INCREMENT  operations  is 

n  n 

YCi  -  Y 2  - bn  +  b° 

i  =  1  i= 1 

=  2  n  -bn  +  b0  . 

Note  in  particulai'  that  since  b0  <  k,  as  long  as  k  =  O(n),  the  total  actual  cost 
is  O(n).  In  other  words,  if  we  execute  at  least  n  =  Q(k)  INCREMENT  operations, 
the  total  actual  cost  is  0(n),  no  matter  what  initial  value  the  counter  contains. 

Exercises 


17.3- 1 

Suppose  we  have  a  potential  function  <£>  such  that  <I>(D/)  >  <E>(  Z>0 )  for  all  i,  but 
$>(  D0)  ^  0.  Show  that  there  exists  a  potential  function  <f>'  such  that  <!>'(  Z)0)  =  0, 
>  0  for  all  i  >  1,  and  the  amortized  costs  using  <f>'  are  the  same  as  the 
amortized  costs  using  <f>. 

17.3- 2 

Redo  Exercise  17.1-3  using  a  potential  method  of  analysis. 

17.3- 3 

Consider  an  ordinary  binary  min-heap  data  structure  with  n  elements  supporting 
the  instructions  INSERT  and  Extract-Min  in  0(\g  n)  worst-case  time.  Give  a 
potential  function  d>  such  that  the  amortized  cost  of  INSERT  is  0(lgn)  and  the 
amortized  cost  of  Extract-Min  is  (9(1),  and  show  that  it  works. 


17.3-4 

What  is  the  total  cost  of  executing  n  of  the  stack  operations  PUSH,  POP,  and 
Multipop,  assuming  that  the  stack  begins  with  ,v0  objects  and  finishes  with  sn 
objects? 


17.3-5 

Suppose  that  a  counter  begins  at  a  number  with  b  Is  in  its  binary  representa¬ 
tion,  rather  than  at  0.  Show  that  the  cost  of  performing  n  Increment  operations 
is  O(n)  if  n  =  £2(b).  (Do  not  assume  that  b  is  constant.) 
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17.3-6 

Show  how  to  implement  a  queue  with  two  ordinary  stacks  (Exercise  10.1-6)  so  that 
the  amortized  cost  of  each  Enqueue  and  each  Dequeue  operation  is  0(1). 


17.3-7 

Design  a  data  structure  to  support  the  following  two  operations  for  a  dynamic 
multiset  S  of  integers,  which  allows  duplicate  values: 

Insert(S,x)  inserts  x  into  S. 

Delete-Larger-Half  (S')  deletes  the  largest  |"|S|  /2]  elements  from  S. 

Explain  how  to  implement  this  data  structure  so  that  any  sequence  of  m  INSERT 
and  Delete-Larger-Half  operations  runs  in  0(m)  time.  Your  implementation 
should  also  include  a  way  to  output  the  elements  of  S  in  0(|S|)  time. 


17.4  Dynamic  tables 

We  do  not  always  know  in  advance  how  many  objects  some  applications  will  store 
in  a  table.  We  might  allocate  space  for  a  table,  only  to  find  out  later  that  it  is  not 
enough.  We  must  then  reallocate  the  table  with  a  larger  size  and  copy  all  objects 
stored  in  the  original  table  over  into  the  new,  larger  table.  Similarly,  if  many  objects 
have  been  deleted  from  the  table,  it  may  be  worthwhile  to  reallocate  the  table  with 
a  smaller  size.  In  this  section,  we  study  this  problem  of  dynamically  expanding  and 
contracting  a  table.  Using  amortized  analysis,  we  shall  show  that  the  amortized  cost 
of  insertion  and  deletion  is  only  0(1),  even  though  the  actual  cost  of  an  operation 
is  large  when  it  triggers  an  expansion  or  a  contraction.  Moreover,  we  shall  see  how 
to  guarantee  that  the  unused  space  in  a  dynamic  table  never  exceeds  a  constant 
fraction  of  the  total  space. 

We  assume  that  the  dynamic  table  supports  the  operations  Table-Insert  and 
Table-Delete.  Table-Insert  inserts  into  the  table  an  item  that  occupies  a  sin¬ 
gle  slot,  that  is,  a  space  for  one  item.  Likewise,  Table-Delete  removes  an  item 
from  the  table,  thereby  freeing  a  slot.  The  details  of  the  data-structuring  method 
used  to  organize  the  table  are  unimportant;  we  might  use  a  stack  (Section  10.1), 
a  heap  (Chapter  6),  or  a  hash  table  (Chapter  11).  We  might  also  use  an  array  or 
collection  of  arrays  to  implement  object  storage,  as  we  did  in  Section  10.3. 

We  shall  find  it  convenient  to  use  a  concept  introduced  in  our  analysis  of  hashing 
(Chapter  11).  We  define  the  load  factor  a(T)  of  a  nonempty  table  T  to  be  the 
number  of  items  stored  in  the  table  divided  by  the  size  (number  of  slots)  of  the 
table.  We  assign  an  empty  table  (one  with  no  items)  size  0,  and  we  define  its  load 
factor  to  be  1 .  If  the  load  factor  of  a  dynamic  table  is  bounded  below  by  a  constant, 
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the  unused  space  in  the  table  is  never  more  than  a  constant  fraction  of  the  total 
amount  of  space. 

We  start  by  analyzing  a  dynamic  table  in  which  we  only  insert  items.  We  then 
consider  the  more  general  case  in  which  we  both  insert  and  delete  items. 

17.4.1  Table  expansion 

Let  us  assume  that  storage  for  a  table  is  allocated  as  an  array  of  slots.  A  table  fills 
up  when  all  slots  have  been  used  or,  equivalently,  when  its  load  factor  is  1 . 1  In  some 
software  environments,  upon  attempting  to  insert  an  item  into  a  full  table,  the  only 
alternative  is  to  abort  with  an  error.  We  shall  assume,  however,  that  our  software 
environment,  like  many  modem  ones,  provides  a  memory-management  system  that 
can  allocate  and  free  blocks  of  storage  on  request.  Thus,  upon  inserting  an  item 
into  a  full  table,  we  can  expand  the  table  by  allocating  a  new  table  with  more  slots 
than  the  old  table  had.  Because  we  always  need  the  table  to  reside  in  contiguous 
memory,  we  must  allocate  a  new  array  for  the  larger  table  and  then  copy  items  from 
the  old  table  into  the  new  table. 

A  common  heuristic  allocates  a  new  table  with  twice  as  many  slots  as  the  old 
one.  If  the  only  table  operations  are  insertions,  then  the  load  factor  of  the  table  is 
always  at  least  1/2,  and  thus  the  amount  of  wasted  space  never  exceeds  half  the 
total  space  in  the  table. 

In  the  following  pseudocode,  we  assume  that  T  is  an  object  representing  the 
table.  The  attribute  T.  table  contains  a  pointer  to  the  block  of  storage  representing 
the  table,  T.num  contains  the  number  of  items  in  the  table,  and  T.  size  gives  the  total 
number  of  slots  in  the  table.  Initially,  the  table  is  empty:  T.num  =  T.size  =  0. 

Table-Insert  (T,  x) 

1  if  T.size  ==  0 

2  allocate  T.  table  with  1  slot 

3  T.size  =  1 

4  if  T.num  ==  T.size 

5  allocate  new-table  with  2  •  T.  size  slots 

6  insert  all  items  in  T.  table  into  new-table 

7  free  T.  table 

8  T.  table  =  new-table 

9  T.  size  =  2  ■  T.  size 

10  insert  x  into  T.  table 

11  T.  num  =  T.  man  +  1 


1  [n  some  situations,  such  as  an  open  address  hash  table,  we  may  wish  to  consider  a  table  to  be  full  if 
its  load  factor  equals  some  constant  strictly  less  than  1.  (See  Exercise  17.4  1.) 
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Notice  that  we  have  two  “insertion”  procedures  here:  the  Table-Insert  proce¬ 
dure  itself  and  the  elementary  insertion  into  a  table  in  lines  6  and  10.  We  can 
analyze  the  running  time  of  Table-Insert  in  terms  of  the  number  of  elementary 
insertions  by  assigning  a  cost  of  1  to  each  elementary  insertion.  We  assume  that 
the  actual  running  time  of  Table-Insert  is  linear  in  the  time  to  insert  individual 
items,  so  that  the  overhead  for  allocating  an  initial  table  in  line  2  is  constant  and 
the  overhead  for  allocating  and  freeing  storage  in  lines  5  and  7  is  dominated  by 
the  cost  of  transferring  items  in  line  6.  We  call  the  event  in  which  lines  5-9  are 
executed  an  expansion. 

Let  us  analyze  a  sequence  of  n  Table-Insert  operations  on  an  initially  empty 
table.  What  is  the  cost  c,  of  the  /  th  operation?  If  the  current  table  has  room  for  the 
new  item  (or  if  this  is  the  first  operation),  then  c,-  =  1,  since  we  need  only  perform 
the  one  elementary  insertion  in  line  10.  If  the  current  table  is  full,  however,  and  an 
expansion  occurs,  then  d  =  i:  the  cost  is  1  for  the  elementary  insertion  in  line  10 
plus  i  —  1  for  the  items  that  we  must  copy  from  the  old  table  to  the  new  table  in 
line  6.  If  we  perform  n  operations,  the  worst-case  cost  of  an  operation  is  0(n), 
which  leads  to  an  upper  bound  of  0(n2)  on  the  total  running  time  for  n  operations. 

This  bound  is  not  tight,  because  we  rarely  expand  the  table  in  the  course  of  n 
Table-Insert  operations.  Specifically,  the  z'th  operation  causes  an  expansion 
only  when  f  —  1  is  an  exact  power  of  2.  The  amortized  cost  of  an  operation  is  in 
fact  0(1),  as  we  can  show  using  aggregate  analysis.  The  cost  of  the  z'th  operation 
is 

i  if  f  —  1  is  an  exact  power  of  2  , 

1  otherwise  . 

The  total  cost  of  n  Table-Insert  operations  is  therefore 

n  LlgnJ 

±  n  +  J22J 

i= 1  j= 0 

<  n  +  2n 

=  3/7  , 

because  at  most  n  operations  cost  1  and  the  costs  of  the  remaining  operations  form 
a  geometric  series.  Since  the  total  cost  of  n  Table-Insert  operations  is  bounded 
by  3 Z2,  the  amortized  cost  of  a  single  operation  is  at  most  3. 

By  using  the  accounting  method,  we  can  gain  some  feeling  for  why  the  amor¬ 
tized  cost  of  a  Table-Insert  operation  should  be  3.  Intuitively,  each  item  pays 
for  3  elementary  insertions:  inserting  itself  into  the  current  table,  moving  itself 
when  the  table  expands,  and  moving  another  item  that  has  already  been  moved 
once  when  the  table  expands.  For  example,  suppose  that  the  size  of  the  table  is  m 
immediately  after  an  expansion.  Then  the  table  holds  m  /  2  items,  and  it  contains 
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no  credit.  We  charge  3  dollars  for  each  insertion.  The  elementary  insertion  that 
occurs  immediately  costs  1  dollar.  We  place  another  dollar  as  credit  on  the  item 
inserted.  We  place  the  third  dollar  as  credit  on  one  of  the  m/2  items  already  in  the 
table.  The  table  will  not  fill  again  until  we  have  inserted  another  m/2  —  1  items, 
and  thus,  by  the  time  the  table  contains  m  items  and  is  full,  we  will  have  placed  a 
dollar  on  each  item  to  pay  to  reinsert  it  during  the  expansion. 

We  can  use  the  potential  method  to  analyze  a  sequence  of  n  Table-Insert 
operations,  and  we  shall  use  it  in  Section  17.4.2  to  design  a  Table-Delete  op¬ 
eration  that  has  an  0(1)  amortized  cost  as  well.  We  start  by  defining  a  potential 
function  <f>  that  is  0  immediately  after  an  expansion  but  builds  to  the  table  size  by 
the  time  the  table  is  full,  so  that  we  can  pay  for  the  next  expansion  by  the  potential. 
The  function 

dqT)  =  2  ■  T.num  —  T.size  (12.5) 

is  one  possibility.  Immediately  after  an  expansion,  we  have  T.num  =  T.size/ 2, 
and  thus  TIT)  =  0,  as  desired.  Immediately  before  an  expansion,  we  have 
T.num  =  T.size,  and  thus  <&(T)  =  T.num,  as  desired.  The  initial  value  of  the 
potential  is  0,  and  since  the  table  is  always  at  least  half  full,  T.  num  >  T.  size / 2, 
which  implies  that  <t>(T)  is  always  nonnegative.  Thus,  the  sum  of  the  amortized 
costs  of  n  Table-Insert  operations  gives  an  upper  bound  on  the  sum  of  the  actual 
costs. 

To  analyze  the  amortized  cost  of  the  ith  Table-Insert  operation,  we  let  num, 
denote  the  number  of  items  stored  in  the  table  after  the  i  th  operation,  size,  denote 
the  total  size  of  the  table  after  the  i  th  operation,  and  <t>,  denote  the  potential  after 
the  ith  operation.  Initially,  we  have  num0  =  0,  sizeo  =  0,  and  $>0  =  0. 

If  the  ith  Table-Insert  operation  does  not  trigger  an  expansion,  then  we  have 
size,  =  size i -i  and  the  amortized  cost  of  the  operation  is 

ct  =  Ci  +  <f>,  —  d>;_i 

=  1  +  (2  •  num,  —  sizej)  —  (2  ■  nunij -\  —  sizej- 1) 

=  1  +  (2  •  num,  —  sizej)  —  (2 (jiumi  —  1)  —  sizej) 

=  3  . 

If  the  ith  operation  does  trigger  an  expansion,  then  we  have  sizej  =  2  ■  sizej- 1  and 
sizej-i  =  nunij-i  =  nunij  —  1,  which  implies  that  sizej  =  2  ■  {num,  —  1).  Thus, 
the  amortized  cost  of  the  operation  is 

Cj  =  Cj  +  <f>,  —  0/_i 

=  nunij  +  (2  ■  nunij  —  sizej)  —  (2  •  numj-i  —  sizej- 1) 

=  nunij  +  (2  ■  nunij  —  2  ■  {nunij  —  1))  —  (2 {nunij  —  1)  —  {nunij  —  1)) 

=  nunij  +  2—  {nunij  —  1) 

=  3  . 
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Figure  17.3  The  effect  of  a  sequence  of  n  Table  INSERT  operations  on  the  number  nut»i  of  items 
in  the  table,  the  number  size ,•  of  slots  in  the  table,  and  the  potential  <t>;  =  2  •  nunij  —  sizej ,  each 
being  measured  after  the  »th  operation.  The  thin  line  shows  nunij ,  the  dashed  line  shows  sizej ,  and 
the  thick  line  shows  0; .  Notice  that  immediately  before  an  expansion,  the  potential  has  built  up  to 
the  number  of  items  in  the  table,  and  therefore  it  can  pay  for  moving  all  the  items  to  the  new  table. 
Afterwards,  the  potential  drops  to  0,  but  it  is  immediately  increased  by  2  upon  inserting  the  item  that 
caused  the  expansion. 


Figure  17.3  plots  the  values  of  nunij,  sizej,  and  <J>,  against  i.  Notice  how  the 
potential  builds  to  pay  for  expanding  the  table. 

17.4.2  Table  expansion  and  contraction 

To  implement  a  Table- Delete  operation,  it  is  simple  enough  to  remove  the  spec¬ 
ified  item  from  the  table.  In  order  to  limit  the  amount  of  wasted  space,  however, 
we  might  wish  to  contract  the  table  when  the  load  factor  becomes  too  small.  Table 
contraction  is  analogous  to  table  expansion:  when  the  number  of  items  in  the  table 
drops  too  low,  we  allocate  a  new,  smaller  table  and  then  copy  the  items  from  the 
old  table  into  the  new  one.  We  can  then  free  the  storage  for  the  old  table  by  return¬ 
ing  it  to  the  memory-management  system.  Ideally,  we  would  like  to  preserve  two 
properties: 

•  the  load  factor  of  the  dynamic  table  is  bounded  below  by  a  positive  constant, 
and 

•  the  amortized  cost  of  a  table  operation  is  bounded  above  by  a  constant. 
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We  assume  that  we  measure  the  cost  in  terms  of  elementary  insertions  and  dele¬ 
tions. 

You  might  think  that  we  should  double  the  table  size  upon  inserting  an  item  into 
a  full  table  and  halve  the  size  when  a  deleting  an  item  would  cause  the  table  to 
become  less  than  half  full.  This  strategy  would  guarantee  that  the  load  factor  of 
the  table  never  drops  below  1/2,  but  unfortunately,  it  can  cause  the  amortized  cost 
of  an  operation  to  be  quite  large.  Consider  the  following  scenario.  We  perform  n 
operations  on  a  table  T ,  where  n  is  an  exact  power  of  2.  The  first  n/2  operations  are 
insertions,  which  by  our  previous  analysis  cost  a  total  of  0(«).  At  the  end  of  this 
sequence  of  insertions,  T.num  =  T.size  =  n/2.  For  the  second  n/2  operations, 
we  perform  the  following  sequence: 

insert,  delete,  delete,  insert,  insert,  delete,  delete,  insert,  insert,  .... 

The  first  insertion  causes  the  table  to  expand  to  size  n .  The  two  following  deletions 
cause  the  table  to  contract  back  to  size  n  / 2.  Two  further  insertions  cause  another 
expansion,  and  so  forth.  The  cost  of  each  expansion  and  contraction  is  0(n),  and 
there  are  0(n)  of  them.  Thus,  the  total  cost  of  the  n  operations  is  0  (;? 2 ) ,  making 
the  amortized  cost  of  an  operation  ©(«). 

The  downside  of  this  strategy  is  obvious:  after  expanding  the  table,  we  do  not 
delete  enough  items  to  pay  for  a  contraction.  Likewise,  after  contracting  the  table, 
we  do  not  insert  enough  items  to  pay  for  an  expansion. 

We  can  improve  upon  this  strategy  by  allowing  the  load  factor  of  the  table  to 
drop  below  1/2.  Specifically,  we  continue  to  double  the  table  size  upon  inserting 
an  item  into  a  full  table,  but  we  halve  the  table  size  when  deleting  an  item  causes 
the  table  to  become  less  than  1/4  full,  rather  than  1/2  full  as  before.  The  load 
factor  of  the  table  is  therefore  bounded  below  by  the  constant  1/4. 

Intuitively,  we  would  consider  a  load  factor  of  1/2  to  be  ideal,  and  the  table’s 
potential  would  then  be  0.  As  the  load  factor  deviates  from  1/2,  the  potential 
increases  so  that  by  the  time  we  expand  or  contract  the  table,  the  table  has  garnered 
sufficient  potential  to  pay  for  copying  all  the  items  into  the  newly  allocated  table. 
Thus,  we  will  need  a  potential  function  that  has  grown  to  T.  num  by  the  time  that 
the  load  factor  has  either  increased  to  1  or  decreased  to  1/4.  After  either  expanding 
or  contracting  the  table,  the  load  factor  goes  back  to  1/2  and  the  table’s  potential 
reduces  back  to  0. 

We  omit  the  code  for  Table-Delete,  since  it  is  analogous  to  Table-Insert. 
For  our  analysis,  we  shall  assume  that  whenever  the  number  of  items  in  the  table 
drops  to  0,  we  free  the  storage  for  the  table.  That  is,  if  T.num  =  0,  then  T.size  =  0. 

We  can  now  use  the  potential  method  to  analyze  the  cost  of  a  sequence  of  n 
Table-Insert  and  Table-Delete  operations.  We  staid  by  defining  a  poten¬ 
tial  function  <t>  that  is  0  immediately  after  an  expansion  or  contraction  and  builds 
as  the  load  factor  increases  to  1  or  decreases  to  1/4.  Let  us  denote  the  load  fac- 
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Figure  17.4  The  effect  of  a  sequence  of  n  Table  Insert  and  Table  Delete  operations  on  the 
number  numj  of  items  in  the  table,  the  number  size/  of  slots  in  the  table,  and  the  potential 

^  I  2  •  numj  —  sizej  if  a/  >  1/2  , 

'  j  sizei /2  —  numi  if  ctj  <  1/2  , 

each  measured  after  the  ith  operation.  The  thin  line  shows  numj,  the  dashed  line  shows  sizei ,  and 
the  thick  line  shows  <£; .  Notice  that  immediately  before  an  expansion,  the  potential  has  built  up  to 
the  number  of  items  in  the  table,  and  therefore  it  can  pay  for  moving  all  the  items  to  the  new  table. 
Likewise,  immediately  before  a  contraction,  the  potential  has  built  up  to  the  number  of  items  in  the 
table. 

tor  of  a  nonempty  table  T  by  ot(T)  =  T.num/T.size.  Since  for  an  empty  table, 
T.num  =  T. size  =  0  and  a (T)  =  1,  we  always  have  T.tium  =  a(T ) 
whether  the  table  is  empty  or  not.  We  shall  use  as  our  potential  function 

_  (  2  •  T.nion  -  T.size  if  a(T)  >  1/2  , 

|  T.size/ 2  —  T.num  ifa(T')  <  1/2  . 

Observe  that  the  potential  of  an  empty  table  is  0  and  that  the  potential 
negative.  Thus,  the  total  amortized  cost  of  a  sequence  of  operations  with  respect 
to  <I>  provides  an  upper  bound  on  the  actual  cost  of  the  sequence. 

Before  proceeding  with  a  precise  analysis,  we  pause  to  observe  some  properties 
of  the  potential  function,  as  illustrated  in  Figure  17.4.  Notice  that  when  the  load 
factor  is  1/2,  the  potential  is  0.  When  the  load  factor  is  1,  we  have  T.size  =  T.num, 
which  implies  d>(7’)  =  T.num,  and  thus  the  potential  can  pay  for  an  expansion  if 
an  item  is  inserted.  When  the  load  factor  is  1/4,  we  have  T. size  —  AT.  num ,  which 


•  T.size, 

(17.6) 
is  never 
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implies  <J>(T)  =  T.num,  and  thus  the  potential  can  pay  for  a  contraction  if  an  item 
is  deleted. 

To  analyze  a  sequence  of  n  Table-Insert  and  Table-Delete  operations, 
we  let  Ci  denote  the  actual  cost  of  the  /  th  operation,  c)  denote  its  amortized  cost 
with  respect  to  <f>,  numi  denote  the  number  of  items  stored  in  the  table  after  the  /  th 
operation,  size,  denote  the  total  size  of  the  table  after  the  /  th  operation,  a,-  denote 
the  load  factor  of  the  table  after  the  i  th  operation,  and  <f>,  denote  the  potential  after 
the  /  th  operation.  Initially,  num0  =  0,  sizeo  =  0,  a0  =  1 ,  and  4>n  =  0. 

We  start  with  the  case  in  which  the  /  th  operation  is  Table-Insert.  The  analy¬ 
sis  is  identical  to  that  for  table  expansion  in  Section  17.4.1  if  a,_i  >1/2.  Whether 
the  table  expands  or  not,  the  amortized  cost  c)  of  the  operation  is  at  most  3. 
If  o',  i  <  1/2,  the  table  cannot  expand  as  a  result  of  the  operation,  since  the  ta¬ 
ble  expands  only  when  at-i  =  1.  If  a,  <  1/2  as  well,  then  the  amortized  cost  of 
the  /  th  operation  is 

Ci  =  Ci  +  <£,  —  0,-1 

=  1  +  (size;/ 2  —  numi)  ~  {sizej-i/2  —  numi-i) 

=  1  +  {size  i  1 2  —  numi )  —  (size/ /2  —  (mim,-  —  1)) 

=  0  . 


If  a,-!  <1/2  but  a,  >1/2,  then 
d  —  Ci  +  <5,  —  0,-1 

=  1  +  (2  •  numt  ~  sizet)  —  {sizet-i/2  —  numt-i ) 

=  1  +  (2  (numi-i  +  1)  —  siz,e,-\)  —  ( sizet-i/2  —  numt-i ) 
3 

=  3  ■  numi- i  —  -sizei- 1  +  3 
3 

=  3a j-i  size i-i  -  -sizei-i  +  3 

3  .  3  . 

<  -sizet-i  -  -sizet-i  +  3 

=  3  . 


Thus,  the  amortized  cost  of  a  Table-Insert  operation  is  at  most  3. 

We  now  turn  to  the  case  in  which  the  /th  operation  is  Table-Delete.  In  this 
case,  numi  —  numt- 1  —  1.  If  cq_i  <  1/2,  then  we  must  consider  whether  the 
operation  causes  the  table  to  contract.  If  it  does  not,  then  sizet  =  sizet-i  and  the 
amortized  cost  of  the  operation  is 

Ci  =  Ci  +  —  0,-1 

=  1  +  {sizet/2  —  nunii)  —  {sizet- 1/2  —  numi- i) 

=  1  +  {sizej/2  —  numi)  ~  {sizet/2  —  {numi  +  1)) 

=  2  . 
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If  a,_!  <  1/2  and  the  z'th  operation  does  trigger  a  contraction,  then  the  actual  cost 
of  the  operation  is  c,  =  zzzz/zz,  +  1 ,  since  we  delete  one  item  and  move  zzzz/zz,  items. 
We  have  sizej/ 2  =  size/- 1/4  =  nurrii- 1  =  mim,  +  1,  and  the  amortized  cost  of 
the  operation  is 

Cj  =  Cj  +  €>,  —  4)/-| 

=  ( nunii  +  1)  +  (sizej /2  —  numj)  —  (sizej- 1/2  —  numj- 1) 

=  (numj  +  1)  +  ((numj  +  1)  —  numj)  —  ((2  ■  numj  +  2)  —  (numj  +  1)) 

=  1  . 

When  the  z'th  operation  is  a  Table-Delete  and  a,_i  >  1/2,  the  amortized  cost 
is  also  bounded  above  by  a  constant.  We  leave  the  analysis  as  Exercise  17.4-2. 

In  summary,  since  the  amortized  cost  of  each  operation  is  bounded  above  by 
a  constant,  the  actual  time  for  any  sequence  of  n  operations  on  a  dynamic  table 
is  0(n). 

Exercises 


17.4-1 

Suppose  that  we  wish  to  implement  a  dynamic,  open-address  hash  table.  Why 
might  we  consider  the  table  to  be  full  when  its  load  factor  reaches  some  value  a 
that  is  strictly  less  than  1?  Describe  briefly  how  to  make  insertion  into  a  dynamic, 
open-address  hash  table  run  in  such  a  way  that  the  expected  value  of  the  amortized 
cost  per  insertion  is  0(1).  Why  is  the  expected  value  of  the  actual  cost  per  insertion 
not  necessarily  0(1)  for  all  insertions? 


17.4-2 

Show  that  if  a,_i  >  1/2  and  the  z'th  operation  on  a  dynamic  table  is  Table- 
Delete,  then  the  amortized  cost  of  the  operation  with  respect  to  the  potential 
function  (17.6)  is  bounded  above  by  a  constant. 


17.4-3 

Suppose  that  instead  of  contracting  a  table  by  halving  its  size  when  its  load  factor 
drops  below  1  /4,  we  contract  it  by  multiplying  its  size  by  2/3  when  its  load  factor 
drops  below  1/3.  Using  the  potential  function 

<3 >(T)  =  |2  ■  T.num  —  T.size \  , 

show  that  the  amortized  cost  of  a  Table-Delete  that  uses  this  strategy  is  bounded 
above  by  a  constant. 
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Problems 


17-1  Bit-reversed  binary  counter 

Chapter  30  examines  an  important  algorithm  called  the  fast  Fourier  transform, 
or  FFT.  The  first  step  of  the  FFT  algorithm  performs  a  bit-reversal  permutation  on 
an  input  array  A[0  . .  /;  —  1]  whose  length  is  n  =  2k  for  some  nonnegative  integer  k. 
This  permutation  swaps  elements  whose  indices  have  binary  representations  that 
are  the  reverse  of  each  other. 

We  can  express  each  index  a  as  a  k- bit  sequence  ak-2,  ■  ■  ■ ,  #o)>  where 

a  =  o  ai  2'  ■  We  define 

r  evk({ak-i,ak-2, . .  .,a0))  =  (a0,au. .  .,ak- 1>  ; 
thus, 

k— 1 

revfc(fl)  =  'Y^ak-i-l2l  . 

i= 0 

For  example,  if  n  =  16  (or,  equivalently,  k  =  4),  then  revj.(3)  =  12,  since 
the  4-bit  representation  of  3  is  0011,  which  when  reversed  gives  1100,  the  4-bit 
representation  of  12. 

a.  Given  a  function  rev/f  that  runs  in  0(/< )  time,  write  an  algorithm  to  perform  the 
bit-reversal  permutation  on  an  array  of  length  n  =  2k  in  0(nk)  time. 

We  can  use  an  algorithm  based  on  an  amortized  analysis  to  improve  the  running 
time  of  the  bit-reversal  permutation.  We  maintain  a  “bit-reversed  counter”  and  a 
procedure  Bit-Reversed-Increment  that,  when  given  a  bit-reversed-counter 
value  a,  produces  rev* (rev* (a)  +  1).  I f  k  =  4,  for  example,  and  the  bit-reversed 
counter  stalls  at  0,  then  successive  calls  to  Bit-Reversed-Increment  produce 
the  sequence 

0000, 1000, 0100,  1 100, 0010,  1010, . . .  =  0, 8,  4, 12, 2,10,...  . 

b.  Assume  that  the  words  in  your  computer  store  A  -b it  values  and  that  in  unit  time, 
your  computer  can  manipulate  the  binary  values  with  operations  such  as  shifting 
left  or  right  by  arbitrary  amounts,  bitwise-AND,  bitwise-OR,  etc.  Describe 
an  implementation  of  the  Bit-Reversed-Increment  procedure  that  allows 
the  bit-reversal  permutation  on  an  n  -element  array  to  be  performed  in  a  total 
of  0(n)  time. 

c.  Suppose  that  you  can  shift  a  word  left  or  right  by  only  one  bit  in  unit  time.  Is  it 
still  possible  to  implement  an  (9(/?)-tiine  bit-reversal  permutation? 
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1 7-2  Making  binary  search  dynamic 

Binary  search  of  a  sorted  array  takes  logarithmic  search  time,  but  the  time  to  insert 
a  new  element  is  linear  in  the  size  of  the  array.  We  can  improve  the  time  for 
insertion  by  keeping  several  sorted  arrays. 

Specifically,  suppose  that  we  wish  to  support  Search  and  Insert  on  a  set 
of  n  elements.  Let  k  =  [lg(n  +  1)],  and  let  the  binary  representation  of  n 
be  («£_i,  nk- 2,  ■  ■  ■ ,  n o)-  We  have  k  sorted  arrays  A0 ,  Alt . . . ,  Ak-\,  where  for 
i  =  0, 1, . . . ,  k  —  1,  the  length  of  array  A,  is  2‘ .  Each  array  is  either  full  or  empty, 
depending  on  whether  n,  =  1  or  /?,  =  0,  respectively.  The  total  number  of  ele¬ 
ments  held  in  all  k  arrays  is  therefore  X)f=o  n>  2'  =  n.  Although  each  individual 
array  is  sorted,  elements  in  different  arrays  bear  no  particular  relationship  to  each 
other. 

a.  Describe  how  to  perform  the  SEARCH  operation  for  this  data  structure.  Analyze 
its  worst-case  running  time. 

b.  Describe  how  to  perform  the  INSERT  operation.  Analyze  its  worst-case  and 
amortized  running  times. 

c.  Discuss  how  to  implement  Delete. 

1 7-3  Amortized  weight-balanced  trees 

Consider  an  ordinary  binary  search  tree  augmented  by  adding  to  each  node  x  the 
attribute  x.size  giving  the  number  of  keys  stored  in  the  subtree  rooted  at  x.  Let  a 
be  a  constant  in  the  range  1/2  <  a  <  1.  We  say  that  a  given  node  x  is  a-balanced 
if  x. left. size  <  a  ■  x.size  and  x. right. size  <  a  ■  x.size.  The  tree  as  a  whole 
is  a-balanced  if  every  node  in  the  tree  is  a-balanced.  The  following  amortized 
approach  to  maintaining  weight-balanced  trees  was  suggested  by  G.  Varghese. 

a.  A  1/2-balanced  tree  is,  in  a  sense,  as  balanced  as  it  can  be.  Given  a  node  x 
in  an  arbitrary  binary  search  tree,  show  how  to  rebuild  the  subtree  rooted  at  x 
so  that  it  becomes  1/2-balanced.  Your  algorithm  should  run  in  time  ©(x.size), 
and  it  can  use  O(x.size)  auxiliary  storage. 

b.  Show  that  performing  a  search  in  an  n-node  a-balanced  binary  search  tree 
takes  0(lg n)  worst-case  time. 

For  the  remainder  of  this  problem,  assume  that  the  constant  a  is  strictly  greater 
than  1/2.  Suppose  that  we  implement  Insert  and  Delete  as  usual  for  an  n-node 
binary  search  tree,  except  that  after  every  such  operation,  if  any  node  in  the  tree 
is  no  longer  a-balanced,  then  we  “rebuild”  the  subtree  rooted  at  the  highest  such 
node  in  the  tree  so  that  it  becomes  1  /  2-balanced. 
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We  shall  analyze  this  rebuilding  scheme  using  the  potential  method.  For  a  node  x 
in  a  binary  search  tree  T,  we  define 

A(x)  =  \x. left. size  —  x. right. size\  , 

and  we  define  the  potential  of  T  as 

&(T)  =  c  AM  . 

xeT:A(x)>2 

where  c  is  a  sufficiently  large  constant  that  depends  on  a. 

c.  Argue  that  any  binary  search  tree  has  nonnegative  potential  and  that  a  1/2- 
balanced  tree  has  potential  0. 

d.  Suppose  that  m  units  of  potential  can  pay  for  rebuilding  an  in -node  subtree. 
How  large  must  c  be  in  terms  of  a  in  order  for  it  to  take  0(1)  amortized  time 
to  rebuild  a  subtree  that  is  not  a -balanced? 

e.  Show  that  inserting  a  node  into  or  deleting  a  node  from  an  /7-node  a -balanced 
tree  costs  0(lg«)  amortized  time. 

17-4  The  cost  of  restructuring  red-black  trees 

There  are  four  basic  operations  on  red-black  trees  that  perform  structural  modi¬ 
fications'.  node  insertions,  node  deletions,  rotations,  and  color  changes.  We  have 
seen  that  RB-Insert  and  RB-Delete  use  only  0(1)  rotations,  node  insertions, 
and  node  deletions  to  maintain  the  red-black  properties,  but  they  may  make  many 
more  color  changes. 

a.  Describe  a  legal  red-black  tree  with  n  nodes  such  that  calling  RB-Insert  to 
add  the  (n  +  l)st  node  causes  F2 (Ig n)  color  changes.  Then  describe  a  legal 
red-black  tree  with  n  nodes  for  which  calling  RB-Delete  on  a  particular  node 
causes  £2(lg/i)  color  changes. 

Although  the  worst-case  number  of  color  changes  per  operation  can  be  logarithmic, 
we  shall  prove  that  any  sequence  of  m  RB-Insert  and  RB-Delete  operations  on 
an  initially  empty  red-black  tree  causes  0(m)  structural  modifications  in  the  worst 
case.  Note  that  we  count  each  color  change  as  a  structural  modification. 

b.  Some  of  the  cases  handled  by  the  main  loop  of  the  code  of  both  RB-Insert- 
Fixup  and  RB-Delete-Fixup  are  terminating',  once  encountered,  they  cause 
the  loop  to  terminate  after  a  constant  number  of  additional  operations.  For  each 
of  the  cases  of  RB -Insert-Fixup  and  RB-Delete-Fixup,  specify  which  are 
terminating  and  which  are  not.  {Hint:  Look  at  Figures  13.5,  13.6,  and  13.7.) 
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We  shall  first  analyze  the  structural  modifications  when  only  insertions  are  per¬ 
formed.  Let  T  be  a  red-black  tree,  and  define  'Ll  7")  to  be  the  number  of  red  nodes 
in  T.  Assume  that  1  unit  of  potential  can  pay  for  the  structural  modifications  per¬ 
formed  by  any  of  the  three  cases  of  RB-Insert-Fixup. 

c.  Let  T'  be  the  result  of  applying  Case  1  of  RB-Insert-Fixup  to  T .  Argue  that 
d>(r)  =  &(T)  - 1. 

d.  When  we  insert  a  node  into  a  red-black  tree  using  RB -INSERT,  we  can  break 
the  operation  into  three  parts.  List  the  structural  modifications  and  potential 
changes  resulting  from  lines  1-16  of  RB-Insert,  from  nonterminating  cases 
of  RB-Insert-Fixup,  and  from  terminating  cases  of  RB-Insert-Fixup. 

e.  Using  part  (d),  argue  that  the  amortized  number  of  structural  modifications  per¬ 
formed  by  any  call  of  RB-Insert  is  0(  1). 

We  now  wish  to  prove  that  there  are  O(m)  structural  modifications  when  there  are 
both  insertions  and  deletions.  Let  us  define,  for  each  node  x, 

0  if  x  is  red  , 

1  if  x  is  black  and  has  no  red  children  , 

0  if  x  is  black  and  has  one  red  child  , 

2  if  x  is  black  and  has  two  red  children  . 

Now  we  redefine  the  potential  of  a  red-black  tree  T  as 
<F(r)  =  ^  w(x)  , 

x€T 

and  let  T'  be  the  tree  that  results  from  applying  any  nonterminating  case  of  RB- 
Insert-Fixup  or  RB-Delete-Fixup  to  T . 

f.  Show  that  Q(T')  <  <F(T)  —  1  for  all  nonterminating  cases  of  RB-Insert- 
Fixup.  Argue  that  the  amortized  number  of  structural  modifications  performed 
by  any  call  of  RB-Insert-Fixup  is  0(  1). 

g.  Show  that  <  <$>(T)  —  1  for  all  nonterminating  cases  of  RB-Delete- 

Fixup.  Argue  that  the  amortized  number  of  structural  modifications  performed 
by  any  call  of  RB-Delete-Fixup  is  0(1). 

h.  Complete  the  proof  that  in  the  worst  case,  any  sequence  of  m  RB-Insert  and 
RB -Delete  operations  performs  0{m)  structural  modifications. 
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17-5  Competitive  analysis  of  self-organizing  lists  with  move-to-front 
A  self-organizing  list  is  a  linked  list  of  n  elements,  in  which  each  element  has  a 
unique  key.  When  we  search  for  an  element  in  the  list,  we  are  given  a  key,  and  we 
want  to  find  an  element  with  that  key. 

A  self-organizing  list  has  two  important  properties: 

1 .  To  find  an  element  in  the  list,  given  its  key,  we  must  traverse  the  list  from  the 
beginning  until  we  encounter  the  element  with  the  given  key.  If  that  element  is 
the  A:  th  element  from  the  start  of  the  list,  then  the  cost  to  find  the  element  is  k. 

2.  We  may  reorder  the  list  elements  after  any  operation,  according  to  a  given  rule 
with  a  given  cost.  We  may  choose  any  heuristic  we  like  to  decide  how  to  reorder 
the  list. 

Assume  that  we  start  with  a  given  list  of  n  elements,  and  we  are  given  an  access 

sequence  a  =  (ay  ,a2 . om)  of  keys  to  find,  in  order.  The  cost  of  the  sequence 

is  the  sum  of  the  costs  of  the  individual  accesses  in  the  sequence. 

Out  of  the  various  possible  ways  to  reorder  the  list  after  an  operation,  this  prob¬ 
lem  focuses  on  transposing  adjacent  list  elements— switching  their  positions  in  the 
list— with  a  unit  cost  for  each  transpose  operation.  You  will  show,  by  means  of  a 
potential  function,  that  a  particular  heuristic  for  reordering  the  list,  move-to-front, 
entails  a  total  cost  no  worse  than  4  times  that  of  any  other  heuristic  for  maintaining 
the  list  order— even  if  the  other  heuristic  knows  the  access  sequence  in  advance! 
We  call  this  type  of  analysis  a  competitive  analysis. 

For  a  heuristic  H  and  a  given  initial  ordering  of  the  list,  denote  the  access  cost  of 
sequence  a  by  Ch(o).  Let  m  be  the  number  of  accesses  in  a. 

a.  Argue  that  if  heuristic  H  does  not  know  the  access  sequence  in  advance,  then 
the  worst-case  cost  for  H  on  an  access  sequence  a  is  Cf  (o')  =  Q(m n). 

With  the  move-to-front  heuristic,  immediately  after  searching  for  an  element  x, 
we  move  x  to  the  first  position  on  the  list  (i.e.,  the  front  of  the  list). 

Let  rankL(x)  denote  the  rank  of  element  x  in  list  L,  that  is,  the  position  of  x  in 
list  L.  For  example,  if  x  is  the  fourth  element  in  L,  then  rank^(x)  =  4.  Let  c, 
denote  the  cost  of  access  a,  using  the  move-to-front  heuristic,  which  includes  the 
cost  of  finding  the  element  in  the  list  and  the  cost  of  moving  it  to  the  front  of  the 
list  by  a  series  of  transpositions  of  adjacent  list  elements. 

b.  Show  that  if  07  accesses  element  x  in  list  L  using  the  move-to-front  heuristic, 
then  Ci  =  2  ■  rankL  (x)  —  1 . 

Now  we  compare  move-to-front  with  any  other  heuristic  H  that  processes  an 
access  sequence  according  to  the  two  properties  above.  Heuristic  H  may  transpose 
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elements  in  the  list  in  any  way  it  wants,  and  it  might  even  know  the  entire  access 
sequence  in  advance. 

Let  Li  be  the  list  after  access  a,  using  move-to-front,  and  let  L*  be  the  list  after 
access  07  using  heuristic  H.  We  denote  the  cost  of  access  07  by  c,  for  move-to- 
front  and  by  c*  for  heuristic  H.  Suppose  that  heuristic  H  performs  t*  transpositions 
during  access  07. 

c.  In  part  (b),  you  showed  that  c,  =  2  ■  ranki;  i(x)  —  1.  Now  show  that  c*  = 
rank^Cx)  +  t* . 

We  define  an  inversion  in  list  L,  as  a  pair  of  elements  y  and  z  such  that  y 
precedes  z  in  L,  and  z  precedes  y  in  list  L*.  Suppose  that  list  L,  has  qt  inversions 
after  processing  the  access  sequence  (ay,  a2, . .  ■ ,  a,).  Then,  we  define  a  potential 
function  T>  that  maps  L,  to  a  real  number  by  )  =  2 q,.  For  example,  if  L ,  has 
the  elements  ( e ,  c,  n,  d ,  b)  and  L*  has  the  elements  (c,  a,  b ,  d ,  e ),  then  L,  has  5 
inversions  ((e,c),(e,a),(e,d),(e,b),(d,b)),  and  so  <E>(L,)  =  10.  Observe  that 
d>( Li)  >  0  for  all  i  and  that,  if  move-to-front  and  heuristic  H  start  with  the  same 
list  L0,  then  <J>(L0)  =  0. 

d.  Argue  that  a  transposition  either  increases  the  potential  by  2  or  decreases  the 
potential  by  2. 

Suppose  that  access  a,  finds  the  element  x.  To  understand  how  the  potential 
changes  due  to  a, ,  let  us  partition  the  elements  other  than  x  into  four  sets,  depend¬ 
ing  on  where  they  are  in  the  lists  just  before  the  zth  access: 

•  Set  A  consists  of  elements  that  precede  x  in  both  L,_!  and  L*_v 

•  Set  B  consists  of  elements  that  precede  x  in  L;_,  and  follow  x  in  L*_v 

•  Set  C  consists  of  elements  that  follow  x  in  and  precede  x  in  L*_  t . 

•  Set  D  consists  of  elements  that  follow  x  in  both  L,  _|  and  L*_1. 

e.  Argue  that  rank^^  j  (x)  =  \A\  +  |B|  +  1  and  ranki*_i  (x)  =  \A\  +  |C|  +  1. 

/.  Show  that  access  a,  causes  a  change  in  potential  of 
<F(L,)-<F(L,-1)<2(|A|  — |£|  +  f*), 

where,  as  before,  heuristic  H  performs  t*  transpositions  during  access  a,  . 
Define  the  amortized  cost  c,-  of  access  a,-  by  cy  =  cy  +  ^(Lj)  —  <f>(L,_!). 

g.  Show  that  the  amortized  cost  c,  of  access  a,  is  bounded  from  above  by  4c*. 

h.  Conclude  that  the  cost  Cmtf(o')  of  access  sequence  a  with  move-to-front  is  at 
most  4  times  the  cost  C#(a)  of  a  with  any  other  heuristic  H,  assuming  that 
both  heuristics  start  with  the  same  list. 
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Chapter  notes 

Aho,  Hopcroft,  and  Ullman  [5]  used  aggregate  analysis  to  determine  the  running 
time  of  operations  on  a  disjoint-set  forest;  we  shall  analyze  this  data  structure  us¬ 
ing  the  potential  method  in  Chapter  21.  Tarjan  [331]  surveys  the  accounting  and 
potential  methods  of  amortized  analysis  and  presents  several  applications.  He  at¬ 
tributes  the  accounting  method  to  several  authors,  including  M.  R.  Brown,  R.  E. 
Tarjan,  S.  Huddleston,  and  K.  Mehlhorn.  He  attributes  the  potential  method  to 
D.  D.  Sleator.  The  term  “amortized”  is  due  to  D.  D.  Sleator  and  R.  E.  Tarjan. 

Potential  functions  are  also  useful  for  proving  lower  bounds  for  certain  types  of 
problems.  For  each  configuration  of  the  problem,  we  define  a  potential  function 
that  maps  the  configuration  to  a  real  number.  Then  we  determine  the  potential  <J>init 
of  the  initial  configuration,  the  potential  <f>fina]  of  the  final  configuration,  and  the 
maximum  change  in  potential  A<3>max  due  to  any  step.  The  number  of  steps  must 
therefore  be  at  least  | Ofinai  —  Omul  /  |  Ad>max|.  Examples  of  potential  functions  to 
prove  lower  bounds  in  I/O  complexity  appeal-  in  works  by  Cormen,  Sundquist,  and 
Wisniewski  [79];  Floyd  [107];  and  Aggarwal  and  Vitter  [3].  Krumme,  Cybenko, 
and  Venkataraman  [221]  applied  potential  functions  to  prove  lower  bounds  on  gos¬ 
siping:  communicating  a  unique  item  from  each  vertex  in  a  graph  to  every  other 
vertex. 

The  move-to-front  heuristic  from  Problem  17-5  works  quite  well  in  practice. 
Moreover,  if  we  recognize  that  when  we  find  an  element,  we  can  splice  it  out  of  its 
position  in  the  list  and  relocate  it  to  the  front  of  the  list  in  constant  time,  we  can 
show  that  the  cost  of  move-to-front  is  at  most  twice  the  cost  of  any  other  heuristic 
including,  again,  one  that  knows  the  entire  access  sequence  in  advance. 
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This  part  returns  to  studying  data  structures  that  support  operations  on  dynamic 
sets,  but  at  a  more  advanced  level  than  Paid  III.  Two  of  the  chapters,  for  example, 
make  extensive  use  of  the  amortized  analysis  techniques  we  saw  in  Chapter  17. 

Chapter  18  presents  B-trees,  which  are  balanced  search  trees  specifically  de¬ 
signed  to  be  stored  on  disks.  Because  disks  operate  much  more  slowly  than 
random-access  memory,  we  measure  the  performance  of  B-trees  not  only  by  how 
much  computing  time  the  dynamic-set  operations  consume  but  also  by  how  many 
disk  accesses  they  perform.  For  each  B-tree  operation,  the  number  of  disk  accesses 
increases  with  the  height  of  the  B-tree,  but  B-tree  operations  keep  the  height  low. 

Chapter  19  gives  an  implementation  of  a  mergeable  heap,  which  supports  the 
operations  Insert,  Minimum,  Extract-Min,  and  Union.1  The  Union  oper¬ 
ation  unites,  or  merges,  two  heaps.  Fibonacci  heaps— the  data  structure  in  Chap¬ 
ter  19— also  support  the  operations  Delete  and  Decrease-Key.  We  use  amor¬ 
tized  time  bounds  to  measure  the  performance  of  Fibonacci  heaps.  The  opera¬ 
tions  Insert,  Minimum,  and  Union  take  only  0(1)  actual  and  amortized  time 
on  Fibonacci  heaps,  and  the  operations  Extract-Min  and  Delete  take  0(lg  n) 
amortized  time.  The  most  significant  advantage  of  Fibonacci  heaps,  however,  is 
that  Decrease-Key  takes  only  0(1)  amortized  time.  Because  the  Decrease- 


1  As  in  Problem  10  2,  we  have  defined  a  mergeable  heap  to  support  Minimum  and  EXTRACT  MlN, 
and  so  we  can  also  refer  to  it  as  a  mergeable  min-heap.  Alternatively,  if  it  supported  Maximum 
and  EXTRACT  Max,  it  would  be  a  mergeable  max-heap.  Unless  we  specify  otherwise,  mergeable 
heaps  will  be  by  default  mergeable  min  heaps. 
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Key  operation  takes  constant  amortized  time,  Fibonacci  heaps  are  key  components 
of  some  of  the  asymptotically  fastest  algorithms  to  date  for  graph  problems. 

Noting  that  we  can  beat  the  £l{n  Ig n)  lower  bound  for  sorting  when  the  keys 
are  integers  in  a  restricted  range,  Chapter  20  asks  whether  we  can  design  a  data 
structure  that  supports  the  dynamic-set  operations  Search,  Insert,  Delete, 
Minimum,  Maximum,  Successor,  and  Predecessor  in  o(lg/t)  time  when 
the  keys  are  integers  in  a  restricted  range.  The  answer  turns  out  to  be  that  we  can, 
by  using  a  recursive  data  structure  known  as  a  van  Emde  Boas  tree.  If  the  keys  are 

unique  integers  drawn  from  the  set  {().  1,2 . it  —  lj,  where  u  is  an  exact  power 

of  2,  then  van  Emde  Boas  trees  support  each  of  the  above  operations  in  0( lg  lg  u) 
time. 

Finally,  Chapter  21  presents  data  structures  for  disjoint  sets.  We  have  a  universe 
of  n  elements  that  are  partitioned  into  dynamic  sets.  Initially,  each  element  belongs 
to  its  own  singleton  set.  The  operation  Union  unites  two  sets,  and  the  query  Find- 
Set  identifies  the  unique  set  that  contains  a  given  element  at  the  moment.  By 
representing  each  set  as  a  simple  rooted  tree,  we  obtain  surprisingly  fast  operations: 
a  sequence  of  m  operations  runs  in  0(m  a(n))  time,  where  a(n)  is  an  incredibly 
slowly  growing  function— a (n)  is  at  most  4  in  any  conceivable  application.  The 
amortized  analysis  that  proves  this  time  bound  is  as  complex  as  the  data  structure 
is  simple. 

The  topics  covered  in  this  part  are  by  no  means  the  only  examples  of  “advanced” 
data  structures.  Other  advanced  data  structures  include  the  following: 

*  Dynamic  trees,  introduced  by  Sleator  and  Tarjan  [319]  and  discussed  by  Tarjan 
[330],  maintain  a  forest  of  disjoint  rooted  trees.  Each  edge  in  each  tree  has 
a  real-valued  cost.  Dynamic  trees  support  queries  to  find  parents,  roots,  edge 
costs,  and  the  minimum  edge  cost  on  a  simple  path  from  a  node  up  to  a  root. 
Trees  may  be  manipulated  by  cutting  edges,  updating  all  edge  costs  on  a  simple 
path  from  a  node  up  to  a  root,  linking  a  root  into  another  tree,  and  making  a 
node  the  root  of  the  tree  it  appears  in.  One  implementation  of  dynamic  trees 
gives  an  O (lg  n)  amortized  time  bound  for  each  operation;  a  more  complicated 
implementation  yields  0(\g  n)  worst-case  time  bounds.  Dynamic  trees  are  used 
in  some  of  the  asymptotically  fastest  network-flow  algorithms. 

*  Splay  trees,  developed  by  Sleator  and  Tarjan  [320]  and,  again,  discussed  by 
Tarjan  [330],  are  a  form  of  binary  search  tree  on  which  the  standard  search- 
tree  operations  run  in  0(lg/j)  amortized  time.  One  application  of  splay  trees 
simplifies  dynamic  trees. 

*  Persistent  data  structures  allow  queries,  and  sometimes  updates  as  well,  on  past 
versions  of  a  data  structure.  Driscoll,  Sarnak,  Sleator,  and  Tarjan  [97]  present 
techniques  for  making  linked  data  structures  persistent  with  only  a  small  time 
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and  space  cost.  Problem  13-1  gives  a  simple  example  of  a  persistent  dynamic 
set. 

•  As  in  Chapter  20,  several  data  structures  allow  a  faster  implementation  of  dic¬ 
tionary  operations  (Insert,  Delete,  and  Search)  for  a  restricted  universe 
of  keys.  By  taking  advantage  of  these  restrictions,  they  are  able  to  achieve  bet¬ 
ter  worst-case  asymptotic  running  times  than  comparison-based  data  structures. 
Fredman  and  Willard  introduced  fusion  trees  [115],  which  were  the  first  data 
structure  to  allow  faster  dictionary  operations  when  the  universe  is  restricted  to 
integers.  They  showed  how  to  implement  these  operations  in  0(\gn/ lglgn) 
time.  Several  subsequent  data  structures,  including  exponential  search  trees 
[16],  have  also  given  improved  bounds  on  some  or  all  of  the  dictionary  opera¬ 
tions  and  are  mentioned  in  the  chapter  notes  throughout  this  book. 

•  Dynamic  graph  data  structures  support  various  queries  while  allowing  the 
structure  of  a  graph  to  change  through  operations  that  insert  or  delete  vertices 
or  edges.  Examples  of  the  queries  that  they  support  include  vertex  connectivity 
[166],  edge  connectivity,  minimum  spanning  trees  [165],  biconnectivity,  and 
transitive  closure  [164]. 

Chapter  notes  throughout  this  book  mention  additional  data  structures. 
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B- Trees 


B-trees  are  balanced  search  trees  designed  to  work  well  on  disks  or  other  direct- 
access  secondary  storage  devices.  B-trees  are  similar  to  red-black  trees  (Chap¬ 
ter  13),  but  they  are  better  at  minimizing  disk  I/O  operations.  Many  database  sys¬ 
tems  use  B-trees,  or  valiants  of  B-trees,  to  store  information. 

B-trees  differ  from  red-black  trees  in  that  B-tree  nodes  may  have  many  children, 
from  a  few  to  thousands.  That  is,  the  “branching  factor”  of  a  B-tree  can  be  quite 
large,  although  it  usually  depends  on  characteristics  of  the  disk  unit  used.  B-trees 
are  similar  to  red-black  trees  in  that  every  «-node  B-tree  has  height  0(\g n).  The 
exact  height  of  a  B-tree  can  be  considerably  less  than  that  of  a  red-black  tree, 
however,  because  its  branching  factor,  and  hence  the  base  of  the  logarithm  that 
expresses  its  height,  can  be  much  larger.  Therefore,  we  can  also  use  B-trees  to 
implement  many  dynamic-set  operations  in  time  0{\gn). 

B-trees  generalize  binary  search  trees  in  a  natural  manner.  Figure  18.1  shows  a 
simple  B-tree.  If  an  internal  B-tree  node  x  contains  x.n  keys,  then  x  has  x.n  +  1 
children.  The  keys  in  node  x  serve  as  dividing  points  separating  the  range  of  keys 
handled  by  x  into  x.n  +  1  subranges,  each  handled  by  one  child  of  x.  When 
searching  for  a  key  in  a  B-tree,  we  make  an  ( x.n  +  l)-way  decision  based  on 
comparisons  with  the  x.n  keys  stored  at  node  x.  The  structure  of  leaf  nodes  differs 
from  that  of  internal  nodes;  we  will  examine  these  differences  in  Section  18.1. 

Section  18.1  gives  a  precise  definition  of  B-trees  and  proves  that  the  height  of 
a  B-tree  grows  only  logarithmically  with  the  number  of  nodes  it  contains.  Sec¬ 
tion  18.2  describes  how  to  search  for  a  key  and  insert  a  key  into  a  B-tree,  and 
Section  18.3  discusses  deletion.  Before  proceeding,  however,  we  need  to  ask  why 
we  evaluate  data  structures  designed  to  work  on  a  disk  differently  from  data  struc¬ 
tures  designed  to  work  in  main  random-access  memory. 

Data  structures  on  secondary  storage 

Computer  systems  take  advantage  of  various  technologies  that  provide  memory 
capacity.  The  primary  memory  (or  main  memory )  of  a  computer  system  normally 
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T.root 


Figure  18.1  A  B  tree  whose  keys  are  the  consonants  of  English.  An  internal  node  x  containing 
x.n  keys  has  x.rt  +  1  children.  All  leaves  are  at  the  same  depth  in  the  tree.  The  lightly  shaded  nodes 
are  examined  in  a  search  for  the  letter  R. 


Figure  18.2  A  typical  disk  drive.  It  comprises  one  or  more  platters  (two  platters  are  shown  here) 
that  rotate  around  a  spindle.  Each  platter  is  read  and  written  with  a  head  at  the  end  of  an  arm.  Arms 
rotate  around  a  common  pivot  axis.  A  track  is  the  surface  that  passes  beneath  the  read/write  head 
when  the  head  is  stationary. 


consists  of  silicon  memory  chips.  This  technology  is  typically  more  than  an  order 
of  magnitude  more  expensive  per  bit  stored  than  magnetic  storage  technology,  such 
as  tapes  or  disks.  Most  computer  systems  also  have  secondary  storage  based  on 
magnetic  disks;  the  amount  of  such  secondary  storage  often  exceeds  the  amount  of 
primary  memory  by  at  least  two  orders  of  magnitude. 

Figure  18.2  shows  a  typical  disk  drive.  The  drive  consists  of  one  or  more  plat¬ 
ters,  which  rotate  at  a  constant  speed  around  a  common  spindle.  A  magnetizable 
material  covers  the  surface  of  each  platter.  The  drive  reads  and  writes  each  platter 
by  a  head  at  the  end  of  an  arm.  The  arms  can  move  their  heads  toward  or  away 
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from  the  spindle.  When  a  given  head  is  stationary,  the  surface  that  passes  under¬ 
neath  it  is  called  a  track.  Multiple  platters  increase  only  the  disk  drive’s  capacity 
and  not  its  performance. 

Although  disks  are  cheaper  and  have  higher  capacity  than  main  memory,  they  are 
much,  much  slower  because  they  have  moving  mechanical  parts.1  The  mechanical 
motion  has  two  components:  platter  rotation  and  arm  movement.  As  of  this  writing, 
commodity  disks  rotate  at  speeds  of  5400-15,000  revolutions  per  minute  (RPM). 
We  typically  see  15,000  RPM  speeds  in  server-grade  drives,  7200  RPM  speeds 
in  drives  for  desktops,  and  5400  RPM  speeds  in  drives  for  laptops.  Although 
7200  RPM  may  seem  fast,  one  rotation  takes  8.33  milliseconds,  which  is  over  5 
orders  of  magnitude  longer  than  the  50  nanosecond  access  times  (more  or  less) 
commonly  found  for  silicon  memory.  In  other  words,  if  we  have  to  wait  a  full  rota¬ 
tion  for  a  particular-  item  to  come  under  the  read/write  head,  we  could  access  main 
memory  more  than  100,000  times  during  that  span.  On  average  we  have  to  wait 
for  only  half  a  rotation,  but  still,  the  difference  in  access  times  for  silicon  memory 
compared  with  disks  is  enoimous.  Moving  the  arms  also  takes  some  time.  As  of 
this  writing,  average  access  times  for  commodity  disks  are  in  the  range  of  8  to  11 
milliseconds. 

In  order  to  amortize  the  time  spent  waiting  for  mechanical  movements,  disks 
access  not  just  one  item  but  several  at  a  time.  Information  is  divided  into  a  number 
of  equal-sized  pages  of  bits  that  appear  consecutively  within  tracks,  and  each  disk 
read  or  write  is  of  one  or  more  entire  pages.  For  a  typical  disk,  a  page  might  be  211 
to  214  bytes  in  length.  Once  the  read/write  head  is  positioned  correctly  and  the  disk 
has  rotated  to  the  beginning  of  the  desired  page,  reading  or  writing  a  magnetic  disk 
is  entirely  electronic  (aside  from  the  rotation  of  the  disk),  and  the  disk  can  quickly 
read  or  write  large  amounts  of  data. 

Often,  accessing  a  page  of  information  and  reading  it  from  a  disk  takes  longer 
than  examining  all  the  information  read.  For  this  reason,  in  this  chapter  we  shall 
look  separately  at  the  two  principal  components  of  the  running  time: 

•  the  number  of  disk  accesses,  and 

•  the  CPU  (computing)  time. 

We  measure  the  number  of  disk  accesses  in  terms  of  the  number  of  pages  of  infor¬ 
mation  that  need  to  be  read  from  or  written  to  the  disk.  We  note  that  disk-access 
time  is  not  constant— it  depends  on  the  distance  between  the  current  track  and 
the  desired  track  and  also  on  the  initial  rotational  position  of  the  disk.  We  shall 


1  As  of  this  writing,  solid  state  drives  have  recently  come  onto  the  consumer  market.  Although  they 

are  faster  than  mechanical  disk  drives,  they  cost  more  per  gigabyte  and  have  lower  capacities  than 
mechanical  disk  drives. 
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nonetheless  use  the  number  of  pages  read  or  written  as  a  first-order  approximation 
of  the  total  time  spent  accessing  the  disk. 

In  a  typical  B-tree  application,  the  amount  of  data  handled  is  so  large  that  all 
the  data  do  not  fit  into  main  memory  at  once.  The  B-tree  algorithms  copy  selected 
pages  from  disk  into  main  memory  as  needed  and  write  back  onto  disk  the  pages 
that  have  changed.  B-tree  algorithms  keep  only  a  constant  number  of  pages  in 
main  memory  at  any  time;  thus,  the  size  of  main  memory  does  not  limit  the  size  of 
B-trees  that  can  be  handled. 

We  model  disk  operations  in  our  pseudocode  as  follows.  Let  x  be  a  pointer  to  an 
object.  If  the  object  is  currently  in  the  computer’s  main  memory,  then  we  can  refer 
to  the  attributes  of  the  object  as  usual:  x.key,  for  example.  If  the  object  referred  to 
by  x  resides  on  disk,  however,  then  we  must  perform  the  operation  Disk-Read  (x) 
to  read  object  x  into  main  memory  before  we  can  refer  to  its  attributes.  (We  as¬ 
sume  that  if  x  is  already  in  main  memory,  then  Disk-Read (x)  requires  no  disk 
accesses;  it  is  a  “no-op.”)  Similarly,  the  operation  Disk-Write(x)  is  used  to  save 
any  changes  that  have  been  made  to  the  attributes  of  object  x.  That  is,  the  typical 
pattern  for  working  with  an  object  is  as  follows: 

x  =  a  pointer  to  some  object 
Disk-Read(x) 

operations  that  access  and/or  modify  the  attributes  of  x 
Disk-Write(x)  //  omitted  if  no  attributes  of  x  were  changed 

other  operations  that  access  but  do  not  modify  attributes  of  x 

The  system  can  keep  only  a  limited  number  of  pages  in  main  memory  at  any  one 
time.  We  shall  assume  that  the  system  flushes  from  main  memory  pages  no  longer 
in  use;  our  B-tree  algorithms  will  ignore  this  issue. 

Since  in  most  systems  the  running  time  of  a  B-tree  algorithm  depends  primar¬ 
ily  on  the  number  of  Disk-Read  and  Disk-Write  operations  it  performs,  we 
typically  want  each  of  these  operations  to  read  or  write  as  much  information  as 
possible.  Thus,  a  B-tree  node  is  usually  as  large  as  a  whole  disk  page,  and  this  size 
limits  the  number  of  children  a  B-tree  node  can  have. 

For  a  large  B-tree  stored  on  a  disk,  we  often  see  branching  factors  between  50 
and  2000,  depending  on  the  size  of  a  key  relative  to  the  size  of  a  page.  A  large 
branching  factor  dramatically  reduces  both  the  height  of  the  tree  and  the  number  of 
disk  accesses  required  to  find  any  key.  Figure  18.3  shows  a  B-tree  with  a  branching 
factor  of  1001  and  height  2  that  can  store  over  one  billion  keys;  nevertheless,  since 
we  can  keep  the  root  node  permanently  in  main  memory,  we  can  find  any  key  in 
this  tree  by  making  at  most  only  two  disk  accesses. 
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1  node, 

1000  keys 

1001  nodes, 
1,001,000  keys 


1,002,001  nodes, 
1,002,001,000  keys 


Figure  18.3  A  B  tree  of  height  2  containing  over  one  billion  keys.  Shown  inside  each  node  x 
is  x.n,  the  number  of  keys  in  x.  Each  internal  node  and  leaf  contains  1000  keys.  This  B  tree  has 
1001  nodes  at  depth  1  and  over  one  million  leaves  at  depth  2. 


18,1  Definition  of  B-trees 

To  keep  things  simple,  we  assume,  as  we  have  for  binary  search  trees  and  red-black 
trees,  that  any  “satellite  information”  associated  with  a  key  resides  in  the  same 
node  as  the  key.  In  practice,  one  might  actually  store  with  each  key  just  a  pointer  to 
another  disk  page  containing  the  satellite  information  for  that  key.  The  pseudocode 
in  this  chapter  implicitly  assumes  that  the  satellite  information  associated  with  a 
key,  or  the  pointer  to  such  satellite  information,  travels  with  the  key  whenever  the 
key  is  moved  from  node  to  node.  A  common  variant  on  a  B-tree,  known  as  a 
B+-tree,  stores  all  the  satellite  information  in  the  leaves  and  stores  only  keys  and 
child  pointers  in  the  internal  nodes,  thus  maximizing  the  branching  factor  of  the 
internal  nodes. 

A  B-tree  T  is  a  rooted  tree  (whose  root  is  T.  root)  having  the  following  proper¬ 
ties: 

1.  Every  node  x  has  the  following  attributes: 

a.  x.n,  the  number  of  keys  currently  stored  in  node  x, 

b.  the  x.n  keys  themselves,  x.keyl ,x.key2,  ■  ■  ■  ,x.keyxn,  stored  in  nondecreas¬ 
ing  order,  so  that  x.key1  <  x.ke y2  <  •  •  •  <  x.keyx  „, 

c.  x .  leaf,  a  boolean  value  that  is  TRUE  if  x  is  a  leaf  and  FALSE  if  x  is  an  internal 
node. 

2.  Each  internal  node  x  also  contains  x.n  +  1  pointers  x.Ci,x.C2, . .  .,x.cx.n+ j  to 
its  children.  Leaf  nodes  have  no  children,  and  so  their  c,  attributes  are  unde¬ 
fined. 
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3.  The  keys  x.keyi  separate  the  ranges  of  keys  stored  in  each  subtree:  if  k,  is  any 
key  stored  in  the  subtree  with  root  x .  c,- ,  then 

k\  <  x.key1  <  k2  <  x.key2  <  <  x.keyx  „  <  kx_n+\  . 

4.  All  leaves  have  the  same  depth,  which  is  the  tree’s  height  It. 

5.  Nodes  have  lower  and  upper  bounds  on  the  number  of  keys  they  can  contain. 
We  express  these  bounds  in  terms  of  a  fixed  integer  t  >2  called  the  minimum 
degree  of  the  B-tree: 

a.  Every  node  other  than  the  root  must  have  at  least  t  —  1  keys.  Every  internal 
node  other  than  the  root  thus  has  at  least  t  children.  If  the  tree  is  nonempty, 
the  root  must  have  at  least  one  key. 

b.  Every  node  may  contain  at  most  2 1  —  1  keys.  Therefore,  an  internal  node 
may  have  at  most  2 1  children.  We  say  that  a  node  is  full  if  it  contains  exactly 
It  —  1  keys.2 

The  simplest  B-tree  occurs  when  t  =  2.  Every  internal  node  then  has  either  2, 
3,  or  4  children,  and  we  have  a  2-3-4  tree.  In  practice,  however,  much  larger  values 
of  t  yield  B -trees  with  smaller  height. 

The  height  of  a  B-tree 

The  number  of  disk  accesses  required  for  most  operations  on  a  B-tree  is  propor¬ 
tional  to  the  height  of  the  B-tree.  We  now  analyze  the  worst-case  height  of  a  B-tree. 

Theorem  18.1 

If  n  >  1,  then  for  any  /7-key  B-tree  T  of  height  h  and  minimum  degree  t  >  2, 

/  i  n  +  l 
h  <  log,  — —  . 

Proof  The  root  of  a  B-tree  T  contains  at  least  one  key,  and  all  other  nodes  contain 
at  least  t  —  1  keys.  Thus,  T,  whose  height  is  h,  has  at  least  2  nodes  at  depth  1,  at 
least  2 1  nodes  at  depth  2,  at  least  2 12  nodes  at  depth  3,  and  so  on,  until  at  depth  h 
it  has  at  least  2th~ 1  nodes.  Figure  18.4  illustrates  such  a  tree  for  h  =  3.  Thus,  the 


2 Another  common  variant  on  a  B  tree,  known  as  a  B* -tree,  requires  each  internal  node  to  be  at 
least  2/3  full,  rather  than  at  least  half  full,  as  a  B  tree  requires. 
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Figure  18.4  A  B  tree  of  height  3  containing  a  minimum  possible  number  of  keys.  Shown  inside 
each  node  a:  is  x.n. 

number  n  of  keys  satisfies  the  inequality 

h 

n  >  1  +  (t  - 

- 

=  2th  —  1  . 

By  simple  algebra,  we  get  th  <  (n  +  l)/2.  Taking  base-/  logarithms  of  both  sides 
proves  the  theorem.  ■ 

Here  we  see  the  power  of  B-trees,  as  compared  with  red-black  trees.  Although 
the  height  of  the  tree  grows  as  0(\gn)  in  both  cases  (recall  that  /  is  a  constant),  for 
B-trees  the  base  of  the  logarithm  can  be  many  times  larger.  Thus,  B-trees  save  a 
factor  of  about  lg  t  over  red-black  trees  in  the  number  of  nodes  examined  for  most 
tree  operations.  Because  we  usually  have  to  access  the  disk  to  examine  an  arbitrary 
node  in  a  tree,  B-trees  avoid  a  substantial  number  of  disk  accesses. 

Exercises 


18.1-1 

Why  don’t  we  allow  a  minimum  degree  of  /  =  1? 


18.1-2 

For  what  values  of  /  is  the  tree  of  Figure  18.1  a  legal  B-tree? 


18.2  Basic  operations  on  B  trees 
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18.1-3 

Show  all  legal  B-trees  of  minimum  degree  2  that  represent  {1, 2,  3,  4,  5}. 


18.1-4 

As  a  function  of  the  minimum  degree  t,  what  is  the  maximum  number  of  keys  that 
can  be  stored  in  a  B-tree  of  height  hi 


18.1-5 

Describe  the  data  structure  that  would  result  if  each  black  node  in  a  red-black  tree 
were  to  absorb  its  red  children,  incorporating  their  children  with  its  own. 


18.2  Basic  operations  on  B-trees 

In  this  section,  we  present  the  details  of  the  operations  B-Tree-Search,  B- 
Tree-Create,  and  B-Tree-Insert.  In  these  procedures,  we  adopt  two  con¬ 
ventions: 

•  The  root  of  the  B-tree  is  always  in  main  memory,  so  that  we  never  need  to 
perform  a  Disk-Read  on  the  root;  we  do  have  to  perform  a  Disk-Write  of 
the  root,  however,  whenever  the  root  node  is  changed. 

•  Any  nodes  that  are  passed  as  parameters  must  already  have  had  a  Disk-Read 
operation  performed  on  them. 

The  procedures  we  present  are  all  “one-pass”  algorithms  that  proceed  downward 
from  the  root  of  the  tree,  without  having  to  back  up. 

Searching  a  B-tree 

Searching  a  B-tree  is  much  like  searching  a  binary  search  tree,  except  that  instead 
of  making  a  binary,  or  “two-way,”  branching  decision  at  each  node,  we  make  a 
multiway  branching  decision  according  to  the  number  of  the  node’s  children.  More 
precisely,  at  each  internal  node  x,  we  make  an  (x.n  +  l)-way  branching  decision. 

B-Tree-Search  is  a  straightforward  generalization  of  the  Tree-Search  pro¬ 
cedure  defined  for  binary  search  trees.  B-Tree-Search  takes  as  input  a  pointer 
to  the  root  node  x  of  a  subtree  and  a  key  k  to  be  searched  for  in  that  subtree.  The 
top-level  call  is  thus  of  the  form  B -Tree- Search  (T.  root,  k).  If  k  is  in  the  B-tree, 
B-Tree-Search  returns  the  ordered  pair  (y,  /)  consisting  of  a  node  y  and  an 
index  i  such  that  y.keyi  =  k.  Otherwise,  the  procedure  returns  NIL. 
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B-Tree-Search(x,^) 

1  1  =  1 

2  while  i  <  x.n  and  k  >  x. key, 

3  z'  =  z'  +  l 

4  if  i  <  x.n  and  k  ==  x.keyt 

5  return  (x,i) 

6  elseif  x .  leaf 

7  return  NIL 

8  else  Disk-ReadCx.c,) 

9  return  B-Tree-Search  (x.c,,  k) 

Using  a  linear-search  procedure,  lines  1-3  find  the  smallest  index  i  such  that 
k  <  x.keyt,  or  else  they  set  i  to  x.n  +  1.  Lines  4-5  check  to  see  whether  we 
have  now  discovered  the  key,  returning  if  we  have.  Otherwise,  lines  6-9  either  ter¬ 
minate  the  search  unsuccessfully  (if  x  is  a  leaf)  or  recurse  to  search  the  appropriate 
subtree  of  x,  after  performing  the  necessary  Disk-Read  on  that  child. 

Figure  18.1  illustrates  the  operation  of  B-Tree-Search.  The  procedure  exam¬ 
ines  the  lightly  shaded  nodes  during  a  search  for  the  key  R. 

As  in  the  Tree-Search  procedure  for  binary  search  trees,  the  nodes  encoun¬ 
tered  during  the  recursion  form  a  simple  path  downward  from  the  root  of  the 
tree.  The  B-Tree-Search  procedure  therefore  accesses  0(h)  =  0(log,  n)  disk 
pages,  where  h  is  the  height  of  the  B-tree  and  n  is  the  number  of  keys  in  the  B-tree. 
Since  x.n  <  2 1,  the  while  loop  of  lines  2-3  takes  O(t  )  time  within  each  node,  and 
the  total  CPU  time  is  0(th)  =  0(t  log,  n). 

Creating  an  empty  B-tree 

To  build  a  B-tree  T ,  we  first  use  B-Tree-Create  to  create  an  empty  root  node 
and  then  call  B -Tree-Insert  to  add  new  keys.  Both  of  these  procedures  use  an 
auxiliary  procedure  Allocate-Node,  which  allocates  one  disk  page  to  be  used 
as  a  new  node  in  0(1)  time.  We  can  assume  that  a  node  created  by  ALLOC  ATE  - 
Node  requires  no  Disk-Read,  since  there  is  as  yet  no  useful  information  stored 
on  the  disk  for  that  node. 

B-Tree-Create  (T) 

1  x  =  Allocate-Node  () 

2  x.leaf  =  true 

3  x.n  =  0 

4  Disk-Write(x) 

5  T.root  =  x 

B-Tree-Create  requires  0(1)  disk  operations  and  0(1)  CPU  time. 
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Inserting  a  key  into  a  B-tree 

Inserting  a  key  into  a  B-tree  is  significantly  more  complicated  than  inserting  a  key 
into  a  binary  search  tree.  As  with  binary  search  trees,  we  search  for  the  leaf  position 
at  which  to  insert  the  new  key.  With  a  B-tree,  however,  we  cannot  simply  create 
a  new  leaf  node  and  insert  it,  as  the  resulting  tree  would  fail  to  be  a  valid  B-tree. 
Instead,  we  insert  the  new  key  into  an  existing  leaf  node.  Since  we  cannot  insert  a 
key  into  a  leaf  node  that  is  full,  we  introduce  an  operation  that  splits  a  full  node  y 
(having  2t  —  1  keys)  around  its  median  key  y .key t  into  two  nodes  having  only  l  —  I 
keys  each.  The  median  key  moves  up  into  y ’s  parent  to  identify  the  dividing  point 
between  the  two  new  trees.  But  if  y ’s  parent  is  also  full,  we  must  split  it  before  we 
can  insert  the  new  key,  and  thus  we  could  end  up  splitting  full  nodes  all  the  way  up 
the  tree. 

As  with  a  binary  search  tree,  we  can  insert  a  key  into  a  B-tree  in  a  single  pass 
down  the  tree  from  the  root  to  a  leaf.  To  do  so,  we  do  not  wait  to  find  out  whether 
we  will  actually  need  to  split  a  full  node  in  order  to  do  the  insertion.  Instead,  as  we 
travel  down  the  tree  searching  for  the  position  where  the  new  key  belongs,  we  split 
each  full  node  we  come  to  along  the  way  (including  the  leaf  itself).  Thus  whenever 
we  want  to  split  a  full  node  y,  we  are  assured  that  its  parent  is  not  full. 

Splitting  a  node  in  a  B-tree 

The  procedure  B -Tree-Split-Child  takes  as  input  a  nonfull  internal  node  x  (as¬ 
sumed  to  be  in  main  memory)  and  an  index  i  such  that  x.c,  (also  assumed  to  be  in 
main  memory)  is  a  fidl  child  of  x.  The  procedure  then  splits  this  child  in  two  and 
adjusts  x  so  that  it  has  an  additional  child.  To  split  a  full  root,  we  will  first  make  the 
root  a  child  of  a  new  empty  root  node,  so  that  we  can  use  B-Tree-Split-Child. 
The  tree  thus  grows  in  height  by  one;  splitting  is  the  only  means  by  which  the  tree 
grows. 

Figure  18.5  illustrates  this  process.  We  split  the  full  node  y  =  x.c,-  about  its 
median  key  S,  which  moves  up  into  y’s  parent  node  x.  Those  keys  in  y  that  are 
greater  than  the  median  key  move  into  a  new  node  z,  which  becomes  a  new  child 
of  x. 
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Figure  18.5  Splitting  a  node  with  t  =  4.  Node  y  =  x.Cj  splits  into  two  nodes,  y  and  z,  and  the 
median  key  S  of  y  moves  up  into  y’s  parent. 


B  -Tree-  Split-Child  (x,  i ) 

1  z  =  Allocate-Node() 

2  y  =  x.Ci 

3  z-leaf  =  y.leaf 

4  z~n  =  1—1 

5  for  j  =  1  to  t  —  1 

6  z.keyj  =  y.keyj+t 

7  if  not  y .  leaf 

8  for  j  =  1  to  I 

9  z.cj  =  y.Cj+, 

10  y.n  =1—1 

11  for  j  =  x.n  +  1  downto  i  +  1 

12  x.Cj+i  =  x.Cj 

13  x.cI+1  =  z 

14  for  j  =  x.n  downto/ 

15  x.keyj+1  =  x.keyj 

16  x.keyi  =  y.keyt 

17  x.n  =  Jt.H  +  1 

18  Disk-Write(_v) 

19  Disk-Write(z) 

20  Disk-Write(x) 

B-Tree-Split-Child  works  by  straightforward  “cutting  and  pasting.”  Here,  jc 
is  the  node  being  split,  and  y  is  x’s  ith  child  (set  in  line  2).  Node  y  originally  has  2 1 
children  (2 1  —  1  keys)  but  is  reduced  to  t  children  (t  —  1  keys)  by  this  operation. 
Node  z  takes  the  t  largest  children  (t  —  1  keys)  from  y,  and  z  becomes  a  new  child 
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of  x,  positioned  just  after  y  in  x’s  table  of  children.  The  median  key  of  y  moves 
up  to  become  the  key  in  x  that  separates  y  and  z. 

Lines  1-9  create  node  z  and  give  it  the  largest  t  —  1  keys  and  corresponding  l 
children  of  y.  Line  10  adjusts  the  key  count  for  y.  Finally,  lines  11-17  insert  z  as 
a  child  of  x,  move  the  median  key  from  y  up  to  x  in  order  to  separate  y  from  z, 
and  adjust  x’s  key  count.  Lines  18-20  write  out  all  modified  disk  pages.  The 
CPU  time  used  by  B-Tree-Split-Child  is  @(i),  due  to  the  loops  on  lines  5-6 
and  8-9.  (The  other  loops  run  for  O(t)  iterations.)  The  procedure  performs  0(  1) 
disk  operations. 

Inserting  a  key  into  a  B-tree  in  a  single  pass  down  the  tree 
We  insert  a  key  k  into  a  B-tree  T  of  height  h  in  a  single  pass  down  the  tree,  re¬ 
quiring  0(h)  disk  accesses.  The  CPU  time  required  is  0(th)  —  0(t  log,  n).  The 
B-Tree-Insert  procedure  uses  B-Tree-Split-Child  to  guarantee  that  the  re¬ 
cursion  never  descends  to  a  full  node. 

B-Tree-Insert  (T,  k) 

1  r  =  T.root 

2  if  r.n  ==  2t  —  1 

3  s  =  Allocate-Node() 

4  T.root  =  s 

5  s.leaf  =  FALSE 

6  s.n  =  0 

7  s.cq  =  r 

8  B-Tree-Split-Child  (s,  1) 

9  B -Tree-Insert-Nonfull  (5,  k) 

10  else  B -Tree-Insert-Nonfull (r\k) 

Lines  3-9  handle  the  case  in  which  the  root  node  r  is  full:  the  root  splits  and  a 
new  node  s  (having  two  children)  becomes  the  root.  Splitting  the  root  is  the  only 
way  to  increase  the  height  of  a  B-tree.  Figure  18.6  illustrates  this  case.  Unlike  a 
binary  search  tree,  a  B-tree  increases  in  height  at  the  top  instead  of  at  the  bottom. 
The  procedure  finishes  by  calling  B-Tree-Insert-Nonfull  to  insert  key  k  into 
the  tree  rooted  at  the  nonfull  root  node.  B-Tree-Insert-Nonfull  recurses  as 
necessary  down  the  tree,  at  all  times  guaranteeing  that  the  node  to  which  it  recurses 
is  not  full  by  calling  B-Tree-Split-Child  as  necessary. 

The  auxiliary  recursive  procedure  B-Tree-Insert-Nonfull  inserts  key  k  into 
node  x,  which  is  assumed  to  be  nonfull  when  the  procedure  is  called.  The  operation 
of  B-Tree-Insert  and  the  recursive  operation  of  B-Tree-Insert-Nonfull 
guarantee  that  this  assumption  is  true. 
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Figure  18.6  Splitting  the  root  with  t  =  4.  Root  node  r  splits  in  two,  and  a  new  root  node  s  is 
created.  The  new  root  contains  the  median  key  of  r  and  has  the  two  halves  of  r  as  children.  The 
B  tree  grows  in  height  by  one  when  the  root  is  split. 


B-Tree-Insert-Nonfull(*,A:) 

1  i  =  x.n 

2  if  x.  leaf 

3  while  i  >  1  and  k  <  x.keyi 

4  x.keyi+ ,  =  x.keyt 

5  i  =  i  —  1 

6  x.keyi+1  =  k 

7  x.n  =  x.n  +  1 

8  Disk- Write  (jc) 

9  else  while  i  >  1  and  k  <  x.keyi 

10  /  =  i  -  1 

11  i  =  /  +  1 

12  Disk-Read  (x.c,) 

13  if  x.Cj.n  ==  2t  —  1 

14  B-Tree-Split-Child(a,/) 

15  if  k  >  x.keyi 

16  i=i  +  1 

17  B  -Tree- Insert- Nonfull  (x.ct.k) 

The  B-Tree-Insert-Nonfull  procedure  works  as  follows.  Lines  3-8  handle 
the  case  in  which  x  is  a  leaf  node  by  inserting  key  k  into  x.  If  x  is  not  a  leaf 
node,  then  we  must  insert  k  into  the  appropriate  leaf  node  in  the  subtree  rooted 
at  internal  node  x.  In  this  case,  lines  9-1 1  determine  the  child  of  x  to  which  the 
recursion  descends.  Line  13  detects  whether  the  recursion  would  descend  to  a  full 
child,  in  which  case  line  14  uses  B-TREE-SPLIT-CHILD  to  split  that  child  into  two 
nonfull  children,  and  lines  15-16  determine  which  of  the  two  children  is  now  the 
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correct  one  to  descend  to.  (Note  that  there  is  no  need  for  a  Disk-Read(x.c;)  after 
line  16  increments  i,  since  the  recursion  will  descend  in  this  case  to  a  child  that 
was  just  created  by  B-Tree-Split-Child.)  The  net  effect  of  lines  13-16  is  thus 
to  guarantee  that  the  procedure  never  recurses  to  a  full  node.  Line  17  then  recurses 
to  insert  k  into  the  appropriate  subtree.  Figure  18.7  illustrates  the  various  cases  of 
inserting  into  a  B-tree. 

For  a  B-tree  of  height  h,  B-Tree-Insert  performs  0(h)  disk  accesses,  since 
only  0(1)  Disk-Read  and  Disk-Write  operations  occur  between  calls  to 
B -Tree-Insert-Nonfull.  The  total  CPU  time  used  is  0(th )  =  0(flogtn). 
Since  B -Tree-Insert-Nonfull  is  tail-recursive,  we  can  alternatively  imple¬ 
ment  it  as  a  while  loop,  thereby  demonstrating  that  the  number  of  pages  that  need 
to  be  in  main  memory  at  any  time  is  0(1). 

Exercises 


18.2-1 

Show  the  results  of  inserting  the  keys 

F.S.Q.  K.  C.  L.  H.  T,  V ,  W.M.R,  N,  P,  A ,  B ,  X,  Y.  D.  Z,  E 

in  order  into  an  empty  B-tree  with  minimum  degree  2.  Draw  only  the  configura¬ 
tions  of  the  tree  just  before  some  node  must  split,  and  also  draw  the  final  configu¬ 
ration. 


18.2-2 

Explain  under  what  circumstances,  if  any,  redundant  Disk-Read  or  Disk-Write 
operations  occur  during  the  course  of  executing  a  call  to  B-Tree-Insert.  (A 
redundant  Disk-Read  is  a  Disk-Read  for  a  page  that  is  already  in  memory. 
A  redundant  Disk-Write  writes  to  disk  a  page  of  information  that  is  identical  to 
what  is  already  stored  there.) 


18.2- 3 

Explain  how  to  find  the  minimum  key  stored  in  a  B-tree  and  how  to  find  the  prede¬ 
cessor  of  a  given  key  stored  in  a  B-tree. 

18.2- 4  * 

Suppose  that  we  insert  the  keys  (1,2 ,...,«}  into  an  empty  B-tree  with  minimum 
degree  2.  How  many  nodes  does  the  final  B-tree  have? 


18.2-5 

Since  leaf  nodes  require  no  pointers  to  children,  they  could  conceivably  use  a  dif¬ 
ferent  (larger)  t  value  than  internal  nodes  for  the  same  disk  page  size.  Show  how 
to  modify  the  procedures  for  creating  and  inserting  into  a  B-tree  to  handle  this 
variation. 
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Figure  18.7  Inserting  keys  into  a  B  tree.  The  minimum  degree  t  for  this  B  tree  is  3,  so  a  node  can 
hold  at  most  5  keys.  Nodes  that  are  modified  by  the  insertion  process  are  lightly  shaded,  (a)  The 
initial  tree  for  this  example,  (b)  The  result  of  inserting  B  into  the  initial  tree;  this  is  a  simple  insertion 
into  a  leaf  node,  (c)  The  result  of  inserting  Q  into  the  previous  tree.  The  node  RST  UV  splits  into 
two  nodes  containing  RS  and  UV,  the  key  T  moves  up  to  the  root,  and  Q  is  inserted  in  the  leftmost 
of  the  two  halves  (the  RS  node),  (d)  The  result  of  inserting  L  into  the  previous  tree.  The  root 
splits  right  away,  since  it  is  full,  and  the  B  tree  grows  in  height  by  one.  Then  L  is  inserted  into  the 
leaf  containing  JK.  (e)  The  result  of  inserting  F  into  the  previous  tree.  The  node  ABODE  splits 
before  F  is  inserted  into  the  rightmost  of  the  two  halves  (the  DE  node). 
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18.2-6 

Suppose  that  we  were  to  implement  B-Tree-Search  to  use  binary  search  rather 
than  lineal-  search  within  each  node.  Show  that  this  change  makes  the  CPU  time 
required  0( lg  /;),  independently  of  how  t  might  be  chosen  as  a  function  of  n. 


18.2-7 

Suppose  that  disk  hardware  allows  us  to  choose  the  size  of  a  disk  page  arbitrarily, 
but  that  the  time  it  takes  to  read  the  disk  page  is  a  +  bt,  where  a  and  b  are  specified 
constants  and  t  is  the  minimum  degree  for  a  B-tree  using  pages  of  the  selected  size. 
Describe  how  to  choose  t  so  as  to  minimize  (approximately)  the  B-tree  search  time. 
Suggest  an  optimal  value  of  t  for  the  case  in  which  a  =  5  milliseconds  and  b  =  10 
microseconds. 


18.3  Deleting  a  key  from  a  B-tree 

Deletion  from  a  B-tree  is  analogous  to  insertion  but  a  little  more  complicated,  be¬ 
cause  we  can  delete  a  key  from  any  node— not  just  a  leaf— and  when  we  delete  a 
key  from  an  internal  node,  we  will  have  to  rearrange  the  node’s  children.  As  in 
insertion,  we  must  guard  against  deletion  producing  a  tree  whose  structure  violates 
the  B-tree  properties.  Just  as  we  had  to  ensure  that  a  node  didn’t  get  too  big  due  to 
insertion,  we  must  ensure  that  a  node  doesn’t  get  too  small  during  deletion  (except 
that  the  root  is  allowed  to  have  fewer  than  the  minimum  number  t  —  1  of  keys). 
Just  as  a  simple  insertion  algorithm  might  have  to  back  up  if  a  node  on  the  path 
to  where  the  key  was  to  be  inserted  was  full,  a  simple  approach  to  deletion  might 
have  to  back  up  if  a  node  (other  than  the  root)  along  the  path  to  where  the  key  is  to 
be  deleted  has  the  minimum  number  of  keys. 

The  procedure  B -Tree-Delete  deletes  the  key  k  from  the  subtree  rooted  at  x. 
We  design  this  procedure  to  guarantee  that  whenever  it  calls  itself  recursively  on  a 
node  x,  the  number  of  keys  in  x  is  at  least  the  minimum  degree  t.  Note  that  this 
condition  requires  one  more  key  than  the  minimum  required  by  the  usual  B-tree 
conditions,  so  that  sometimes  a  key  may  have  to  be  moved  into  a  child  node  before 
recursion  descends  to  that  child.  This  strengthened  condition  allows  us  to  delete  a 
key  from  the  tree  in  one  downward  pass  without  having  to  “back  up”  (with  one  ex¬ 
ception,  which  we’ll  explain).  You  should  interpret  the  following  specification  for 
deletion  from  a  B-tree  with  the  understanding  that  if  the  root  node  x  ever  becomes 
an  internal  node  having  no  keys  (this  situation  can  occur  in  cases  2c  and  3b  on 
pages  501-502),  then  we  delete  x,  and  jc’s  only  child  x.C\  becomes  the  new  root 
of  the  tree,  decreasing  the  height  of  the  tree  by  one  and  preserving  the  property  that 
the  root  of  the  tree  contains  at  least  one  key  (unless  the  tree  is  empty). 
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Figure  18.8  Deleting  keys  from  a  B  tree.  The  minimum  degree  for  this  B  tree  is  t  =  3,  so  a  node 
(other  than  the  root)  cannot  have  fewer  than  2  keys.  Nodes  that  are  modified  are  lightly  shaded, 
(a)  The  B  tree  of  Figure  18.7(e).  (b)  Deletion  of  F .  This  is  case  1:  simple  deletion  from  a  leaf, 
(c)  Deletion  of  M .  This  is  case  2a:  the  predecessor  L  of  M  moves  up  to  take  A/’s  position,  (d)  Dele 
tion  of  G.  This  is  case  2c:  we  push  G  down  to  make  node  DE  GJK  and  then  delete  G  from  this  leaf 
(case  1). 

We  sketch  how  deletion  works  instead  of  presenting  the  pseudocode.  Figure  18.8 
illustrates  the  various  cases  of  deleting  keys  from  a  B-tree. 

1.  If  the  key  k  is  in  node  x  and  x  is  a  leaf,  delete  the  key  k  from  x. 

2.  If  the  key  k  is  in  node  x  and  x  is  an  internal  node,  do  the  following: 
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Figure  18.8,  continued  (e)  Deletion  of  D.  This  is  case  3b:  the  recursion  cannot  descend  to 
node  CL  because  it  has  only  2  keys,  so  we  push  P  down  and  merge  it  with  CL  and  TX  to  form 
CLPTX;  then  we  delete  D  from  a  leaf  (case  1).  (e')  After  (e),  we  delete  the  root  and  the  tree  shrinks 
in  height  by  one.  (f)  Deletion  of  B.  This  is  case  3a:  C  moves  to  fill  B's  position  and  E  moves  to 
fill  C’s  position. 

a  If  the  child  y  that  precedes  k  in  node  x  has  at  least  t  keys,  then  find  the 
predecessor  k'  of  k  in  the  subtree  rooted  at  y.  Recursively  delete  k',  and 
replace  k  by  k'  in  x.  (We  can  find  k'  and  delete  it  in  a  single  downward 
pass.) 

b.  If  y  has  fewer  than  t  keys,  then,  symmetrically,  examine  the  child  z  that 
follows  k  in  node  x .  If  z  has  at  least  t  keys,  then  find  the  successor  k'  of  k  in 
the  subtree  rooted  at  z.  Recursively  delete  k',  and  replace  k  by  k'  in  x.  (We 
can  find  k'  and  delete  it  in  a  single  downward  pass.) 

c.  Otherwise,  if  both  y  and  z  have  only  t  —  1  keys,  merge  k  and  all  of  z  into  y, 
so  that  x  loses  both  k  and  the  pointer  to  z,  and  y  now  contains  2t  —  1  keys. 
Then  free  z  and  recursively  delete  k  from  y. 

3.  If  the  key  k  is  not  present  in  internal  node  x,  determine  the  root  x.c,  of  the 
appropriate  subtree  that  must  contain  A:,  if  A:  is  in  the  tree  at  all.  If  x.c,  has 
only  t  —  1  keys,  execute  step  3a  or  3b  as  necessary  to  guarantee  that  we  descend 
to  a  node  containing  at  least  t  keys.  Then  finish  by  recursing  on  the  appropriate 
child  of  x . 
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a.  If  x.Cj  has  only  t  —  1  keys  but  has  an  immediate  sibling  with  at  least  t  keys, 
give  x.Ci  an  extra  key  by  moving  a  key  from  x  down  into  x.c,-,  moving  a 
key  from  x.c,-’s  immediate  left  or  right  sibling  up  into  x,  and  moving  the 
appropriate  child  pointer  from  the  sibling  into  x.c,. 

b.  If  x.Ci  and  both  of  x.c,-’s  immediate  siblings  have  t  —  1  keys,  merge  x.c,- 
with  one  sibling,  which  involves  moving  a  key  from  x  down  into  the  new 
merged  node  to  become  the  median  key  for  that  node. 

Since  most  of  the  keys  in  a  B-tree  are  in  the  leaves,  we  may  expect  that  in 
practice,  deletion  operations  are  most  often  used  to  delete  keys  from  leaves.  The 
B -Tree-Delete  procedure  then  acts  in  one  downward  pass  through  the  tree, 
without  having  to  back  up.  When  deleting  a  key  in  an  internal  node,  however, 
the  procedure  makes  a  downward  pass  through  the  tree  but  may  have  to  return  to 
the  node  from  which  the  key  was  deleted  to  replace  the  key  with  its  predecessor  or 
successor  (cases  2a  and  2b). 

Although  this  procedure  seems  complicated,  it  involves  only  0{h)  disk  oper¬ 
ations  for  a  B-tree  of  height  h,  since  only  0(1)  calls  to  Disk-Read  and  Disk- 
Write  are  made  between  recursive  invocations  of  the  procedure.  The  CPU  time 
required  is  0(th)  =  0{t  log,  n). 

Exercises 


18.3-1 

Show  the  results  of  deleting  C ,  P,  and  V,  in  order,  from  the  tree  of  Figure  18.8(f). 


18.3-2 

Write  pseudocode  for  B -Tree-Delete. 


Problems 


18-1  Stacks  on  secondary  storage 

Consider  implementing  a  stack  in  a  computer  that  has  a  relatively  small  amount 
of  fast  primary  memory  and  a  relatively  large  amount  of  slower  disk  storage.  The 
operations  PUSH  and  POP  work  on  single-word  values.  The  stack  we  wish  to 
support  can  grow  to  be  much  larger  than  can  fit  in  memory,  and  thus  most  of  it 
must  be  stored  on  disk. 

A  simple,  but  inefficient,  stack  implementation  keeps  the  entire  stack  on  disk. 
We  maintain  in  memory  a  stack  pointer,  which  is  the  disk  address  of  the  top  element 
on  the  stack.  If  the  pointer  has  value  p,  the  top  element  is  the  ( p  mod  m)th  word 
on  page  \_p/m\  of  the  disk,  where  m  is  the  number  of  words  per  page. 
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To  implement  the  PUSH  operation,  we  increment  the  stack  pointer,  read  the  ap¬ 
propriate  page  into  memory  from  disk,  copy  the  element  to  be  pushed  to  the  ap¬ 
propriate  word  on  the  page,  and  write  the  page  back  to  disk.  A  POP  operation  is 
similar.  We  decrement  the  stack  pointer,  read  in  the  appropriate  page  from  disk, 
and  return  the  top  of  the  stack.  We  need  not  write  back  the  page,  since  it  was  not 
modified. 

Because  disk  operations  are  relatively  expensive,  we  count  two  costs  for  any 
implementation:  the  total  number  of  disk  accesses  and  the  total  CPU  time.  Any 
disk  access  to  a  page  of  m  words  incurs  charges  of  one  disk  access  and  0(m)  CPU 
time. 

a.  Asymptotically,  what  is  the  worst-case  number  of  disk  accesses  for  n  stack 
operations  using  this  simple  implementation?  What  is  the  CPU  time  for  n  stack 
operations?  (Express  your  answer  in  terms  of  m  and  n  for  this  and  subsequent 
parts.) 

Now  consider  a  stack  implementation  in  which  we  keep  one  page  of  the  stack  in 
memory.  (We  also  maintain  a  small  amount  of  memory  to  keep  track  of  which  page 
is  currently  in  memory.)  We  can  perform  a  stack  operation  only  if  the  relevant  disk 
page  resides  in  memory.  If  necessary,  we  can  write  the  page  currently  in  memory 
to  the  disk  and  read  in  the  new  page  from  the  disk  to  memory.  If  the  relevant  disk 
page  is  already  in  memory,  then  no  disk  accesses  are  required. 

b.  What  is  the  worst-case  number  of  disk  accesses  required  for  n  PUSH  opera¬ 
tions?  What  is  the  CPU  time? 

c.  What  is  the  worst-case  number  of  disk  accesses  required  for  n  stack  operations? 
What  is  the  CPU  time? 

Suppose  that  we  now  implement  the  stack  by  keeping  two  pages  in  memory  (in 
addition  to  a  small  number  of  words  for  bookkeeping). 

d.  Describe  how  to  manage  the  stack  pages  so  that  the  amortized  number  of  disk 
accesses  for  any  stack  operation  is  0{l/m)  and  the  amortized  CPU  time  for 
any  stack  operation  is  0(1). 

18-2  Joining  and  splitting  2-3-4  trees 

The  join  operation  takes  two  dynamic  sets  S'  and  S"  and  an  element  x  such  that 
for  any  x'  e  S'  and  x"  e  S",  we  have  x' .key  <  x.key  <  x" .key.  It  returns  a  set 
S  =  S'  U  \x)  U  S" .  The  split  operation  is  like  an  “inverse”  join:  given  a  dynamic 
set  S  and  an  element  x  e  S,  it  creates  a  set  S'  that  consists  of  all  elements  in 
S  —  {x}  whose  keys  are  less  than  x.key  and  a  set  S''  that  consists  of  all  elements 
in  S  —  {x}  whose  keys  are  greater  than  x.key.  In  this  problem,  we  investigate 
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how  to  implement  these  operations  on  2-3-4  trees.  We  assume  for  convenience  that 

elements  consist  only  of  keys  and  that  all  key  values  are  distinct. 

a.  Show  how  to  maintain,  for  every  node  x  of  a  2-3-4  tree,  the  height  of  the  subtree 
rooted  at  x  as  an  attribute  x .  height.  Make  sure  that  your  implementation  does 
not  affect  the  asymptotic  running  times  of  searching,  insertion,  and  deletion. 

b.  Show  how  to  implement  the  join  operation.  Given  two  2-3-4  trees  T  and  T" 
and  a  key  k,  the  join  operation  should  run  in  0(1  +  | h'  —  h" |)  time,  where  h' 
and  h"  are  the  heights  of  T'  and  T" ,  respectively. 

c.  Consider  the  simple  path  p  from  the  root  of  a  2-3-4  tree  T  to  a  given  key  k, 
the  set  S'  of  keys  in  T  that  are  less  than  k,  and  the  set  S "  of  keys  in  T  that  are 
greater  than  k.  Show  that  p  breaks  S'  into  a  set  of  trees  {7j j,  T', . . . ,  7^}  and  a 
set  of  keys  {k[,k'2, . . . ,  k'm},  where,  for  i  —  1,2,...,  m,  we  have  y  <  k-  <  z 
for  any  keys  y  e  T'_l  and  z  €  T'.  What  is  the  relationship  between  the  heights 
of  7y_j  and  77?  Describe  how  p  breaks  S"  into  sets  of  trees  and  keys. 

d.  Show  how  to  implement  the  split  operation  on  T.  Use  the  join  operation  to 
assemble  the  keys  in  S'  into  a  single  2-3-4  tree  T'  and  the  keys  in  S"  into  a 
single  2-3-4  tree  T" .  The  running  time  of  the  split  operation  should  be  0(\g  n), 
where  n  is  the  number  of  keys  in  T .  {Hint:  The  costs  for  joining  should  tele¬ 
scope.) 


Chapter  notes 

Knuth  [211],  Aho,  Hopcroft,  and  Ullman  [5],  and  Sedgewick  [306]  give  further 
discussions  of  balanced-tree  schemes  and  B-trees.  Comer  [74]  provides  a  compre¬ 
hensive  survey  of  B-trees.  Guibas  and  Sedgewick  [155]  discuss  the  relationships 
among  various  kinds  of  balanced-tree  schemes,  including  red-black  trees  and  2-3-4 
trees. 

In  1970,  J.  E.  Hopcroft  invented  2-3  trees,  a  precursor  to  B-trees  and  2-3-4 
trees,  in  which  every  internal  node  has  either  two  or  three  children.  Bayer  and 
McCreight  [35]  introduced  B-trees  in  1972;  they  did  not  explain  their  choice  of 
name. 

Bender,  Demaine,  and  Farach-Colton  [40]  studied  how  to  make  B-trees  perform 
well  in  the  presence  of  memory-hierarchy  effects.  Their  cache -oblivious  algo¬ 
rithms  work  efficiently  without  explicitly  knowing  the  data  transfer  sizes  within 
the  memory  hierarchy. 
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Fibonacci  Heaps 


The  Fibonacci  heap  data  structure  serves  a  dual  puipose.  First,  it  supports  a  set  of 
operations  that  constitutes  what  is  known  as  a  “mergeable  heap.”  Second,  several 
Fibonacci-heap  operations  run  in  constant  amortized  time,  which  makes  this  data 
structure  well  suited  for  applications  that  invoke  these  operations  frequently. 

Mergeable  heaps 

A  mergeable  heap  is  any  data  structure  that  supports  the  following  five  operations, 
in  which  each  element  has  a  key. 

Make-Heap ()  creates  and  returns  a  new  heap  containing  no  elements. 

INSERT(//,  x )  inserts  element  x,  whose  key  has  already  been  filled  in,  into  heap  H . 

Minimum  (//)  returns  a  pointer  to  the  element  in  heap  H  whose  key  is  minimum. 

Extract-Min(/Z)  deletes  the  element  from  heap  H  whose  key  is  minimum,  re¬ 
turning  a  pointer  to  the  element. 

Union(//i  ,  H2)  creates  and  returns  a  new  heap  that  contains  all  the  elements  of 
heaps  Hi  and  H2.  Heaps  H t  and  H2  are  “destroyed”  by  this  operation. 

In  addition  to  the  mergeable-heap  operations  above,  Fibonacci  heaps  also  support 
the  following  two  operations: 

Decrease-Key  (//,  x,  k)  assigns  to  element  x  within  heap  H  the  new  key 
value  k,  which  we  assume  to  be  no  greater  than  its  current  key  value.1 

Delete!//,  x)  deletes  element  x  from  heap  H . 


'As  mentioned  in  the  introduction  to  Part  V,  our  default  mergeable  heaps  are  mergeable  min 
heaps,  and  so  the  operations  Minimum,  Extract  Min,  and  Decrease  Key  apply.  Altema 
tively,  we  could  define  a  mergeable  max-heap  with  the  operations  Maximum,  Extract  Max, 
and  Increase  Key. 
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Procedure 

Binary  heap 
(worst  case) 

Fibonacci  heap 
(amortized) 

Make  Heap 

0(1) 

©d) 

Insert 

0(lg») 

©d) 

Minimum 

0(1) 

©d) 

Extract  Min 

0(lg») 

0(lgn) 

Union 

&(n) 

0(1) 

Decrease  Key 

®(lg  n) 

0(1) 

Delete 

®(lg  n) 

0(\gn) 

Figure  19.1  Running  times  for  operations  on  two  implementations  of  mergeable  heaps.  The  num 
her  of  items  in  the  heap(s)  at  the  time  of  an  operation  is  denoted  by  n. 

As  the  table  in  Figure  19.1  shows,  if  we  don’t  need  the  UNION  operation,  ordi¬ 
nary  binary  heaps,  as  used  in  heapsort  (Chapter  6),  work  fairly  well.  Operations 
other  than  UNION  run  in  worst-case  time  0(\gn)  on  a  binary  heap.  If  we  need 
to  support  the  UNION  operation,  however,  binary  heaps  perform  poorly.  By  con¬ 
catenating  the  two  arrays  that  hold  the  binary  heaps  to  be  merged  and  then  running 
Build-Min-Heap  (see  Section  6.3),  the  Union  operation  takes  ©(«)  time  in  the 
worst  case. 

Fibonacci  heaps,  on  the  other  hand,  have  better  asymptotic  time  bounds  than 
binary  heaps  for  the  INSERT,  UNION,  and  Decrease-Key  operations,  and  they 
have  the  same  asymptotic  running  times  for  the  remaining  operations.  Note,  how¬ 
ever,  that  the  running  times  for  Fibonacci  heaps  in  Figure  19.1  are  amortized  time 
bounds,  not  worst-case  per-operation  time  bounds.  The  UNION  operation  takes 
only  constant  amortized  time  in  a  Fibonacci  heap,  which  is  significantly  better 
than  the  linear  worst-case  time  required  in  a  binary  heap  (assuming,  of  course,  that 
an  amortized  time  bound  suffices). 

Fibonacci  heaps  in  theory  and  practice 

From  a  theoretical  standpoint,  Fibonacci  heaps  are  especially  desirable  when  the 
number  of  Extract-Min  and  Delete  operations  is  small  relative  to  the  number 
of  other  operations  performed.  This  situation  arises  in  many  applications.  For 
example,  some  algorithms  for  graph  problems  may  call  Decrease-Key  once  per 
edge.  For  dense  graphs,  which  have  many  edges,  the  0(1)  amortized  time  of  each 
call  of  Decrease-Key  adds  up  to  a  big  improvement  over  the  0(lg  n)  worst-case 
time  of  binary  heaps.  Fast  algorithms  for  problems  such  as  computing  minimum 
spanning  trees  (Chapter  23)  and  finding  single-source  shortest  paths  (Chapter  24) 
make  essential  use  of  Fibonacci  heaps. 
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From  a  practical  point  of  view,  however,  the  constant  factors  and  program¬ 
ming  complexity  of  Fibonacci  heaps  make  them  less  desirable  than  ordinary  binary 
(or  k- ary)  heaps  for  most  applications,  except  for  certain  applications  that  manage 
large  amounts  of  data.  Thus,  Fibonacci  heaps  are  predominantly  of  theoretical  in¬ 
terest.  If  a  much  simpler  data  structure  with  the  same  amortized  time  bounds  as 
Fibonacci  heaps  were  developed,  it  would  be  of  practical  use  as  well. 

Both  binary  heaps  and  Fibonacci  heaps  are  inefficient  in  how  they  support  the 
operation  Search;  it  can  take  a  while  to  find  an  element  with  a  given  key.  For  this 
reason,  operations  such  as  Decrease-Key  and  Delete  that  refer  to  a  given  ele¬ 
ment  require  a  pointer  to  that  element  as  part  of  their  input.  As  in  our  discussion  of 
priority  queues  in  Section  6.5,  when  we  use  a  mergeable  heap  in  an  application,  we 
often  store  a  handle  to  the  corresponding  application  object  in  each  mergeable-heap 
element,  as  well  as  a  handle  to  the  corresponding  mergeable-heap  element  in  each 
application  object.  The  exact  nature  of  these  handles  depends  on  the  application 
and  its  implementation. 

Like  several  other  data  structures  that  we  have  seen,  Fibonacci  heaps  are  based 
on  rooted  trees.  We  represent  each  element  by  a  node  within  a  tree,  and  each 
node  has  a  key  attribute.  For  the  remainder  of  this  chapter,  we  shall  use  the  term 
“node”  instead  of  “element.”  We  shall  also  ignore  issues  of  allocating  nodes  prior 
to  insertion  and  freeing  nodes  following  deletion,  assuming  instead  that  the  code 
calling  the  heap  procedures  deals  with  these  details. 

Section  19.1  defines  Fibonacci  heaps,  discusses  how  we  represent  them,  and 
presents  the  potential  function  used  for  their  amortized  analysis.  Section  19.2 
shows  how  to  implement  the  mergeable-heap  operations  and  achieve  the  amortized 
time  bounds  shown  in  Figure  19.1.  The  remaining  two  operations,  DECREASE- 
Key  and  Delete,  form  the  focus  of  Section  19.3.  Finally,  Section  19.4  finishes  a 
key  part  of  the  analysis  and  also  explains  the  curious  name  of  the  data  structure. 


19.1  Structure  of  Fibonacci  heaps 

A  Fibonacci  heap  is  a  collection  of  rooted  trees  that  are  min-heap  ordered.  That 
is,  each  tree  obeys  the  min-heap  property :  the  key  of  a  node  is  greater  than  or  equal 
to  the  key  of  its  parent.  Figure  19.2(a)  shows  an  example  of  a  Fibonacci  heap. 

As  Figure  19.2(b)  shows,  each  node  x  contains  a  pointer  x.p  to  its  parent  and 
a  pointer  x. child  to  any  one  of  its  children.  The  children  of  x  are  linked  together 
in  a  circular,  doubly  linked  list,  which  we  call  the  child  list  of  x.  Each  child  y  in 
a  child  list  has  pointers  y.left  and  y .right  that  point  to  v’s  left  and  right  siblings, 
respectively.  If  node  y  is  an  only  child,  then  y.left  =  y .right  =  y.  Siblings  may 
appear  in  a  child  list  in  any  order. 
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H.min 


Figure  19.2  (a)  A  Fibonacci  heap  consisting  of  five  min  heap  ordered  trees  and  14  nodes.  The 
dashed  line  indicates  the  root  list.  The  minimum  node  of  the  heap  is  the  node  containing  the  key  3. 
Black  nodes  are  marked.  The  potential  of  this  particular  Fibonacci  heap  is  5  +  2  •  3  =  11.  (b)A  more 
complete  representation  showing  pointers  p  (up  arrows),  child  (down  arrows),  and  left  and  right 
(sideways  arrows).  The  remaining  figures  in  this  chapter  omit  these  details,  since  all  the  information 
shown  here  can  be  determined  from  what  appears  in  part  (a). 


Circular,  doubly  linked  lists  (see  Section  10.2)  have  two  advantages  for  use  in 
Fibonacci  heaps.  First,  we  can  insert  a  node  into  any  location  or  remove  a  node 
from  anywhere  in  a  circular,  doubly  linked  list  in  0(1)  time.  Second,  given  two 
such  lists,  we  can  concatenate  them  (or  “splice”  them  together)  into  one  circular, 
doubly  linked  list  in  0(1)  time.  In  the  descriptions  of  Fibonacci  heap  operations, 
we  shall  refer  to  these  operations  informally,  letting  you  fill  in  the  details  of  their 
implementations  if  you  wish. 

Each  node  has  two  other  attributes.  We  store  the  number  of  children  in  the  child 
list  of  node  x  in  x. degree.  The  boolean-valued  attribute  x.mark  indicates  whether 
node  x  has  lost  a  child  since  the  last  time  x  was  made  the  child  of  another  node. 
Newly  created  nodes  are  unmarked,  and  a  node  x  becomes  unmarked  whenever  it 
is  made  the  child  of  another  node.  Until  we  look  at  the  Decrease-Key  operation 
in  Section  19.3,  we  will  just  set  all  mark  attributes  to  FALSE. 

We  access  a  given  Fibonacci  heap  H  by  a  pointer  H.min  to  the  root  of  a  tree 
containing  the  minimum  key;  we  call  this  node  the  minimum  node  of  the  Fibonacci 
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heap.  If  more  than  one  root  has  a  key  with  the  minimum  value,  then  any  such  root 
may  serve  as  the  minimum  node.  When  a  Fibonacci  heap  H  is  empty,  H.min 
is  NIL. 

The  roots  of  all  the  trees  in  a  Fibonacci  heap  are  linked  together  using  their 
left  and  right  pointers  into  a  circular,  doubly  linked  list  called  the  root  list  of  the 
Fibonacci  heap.  The  pointer  H.  min  thus  points  to  the  node  in  the  root  list  whose 
key  is  minimum.  Trees  may  appear  in  any  order  within  a  root  list. 

We  rely  on  one  other  attribute  for  a  Fibonacci  heap  H:  H.n,  the  number  of 
nodes  currently  in  H . 

Potential  function 

As  mentioned,  we  shall  use  the  potential  method  of  Section  17.3  to  analyze  the 
performance  of  Fibonacci  heap  operations.  For  a  given  Fibonacci  heap  H ,  we 
indicate  by  t  ( H )  the  number  of  trees  in  the  root  list  of  H  and  by  m(H)  the  number 
of  marked  nodes  in  H .  We  then  define  the  potential  <J>(  H )  of  Fibonacci  heap  H 
by 

®(H)  =  t(H)  +  2m(H)  .  (19.1) 

(We  will  gain  some  intuition  for  this  potential  function  in  Section  19.3.)  For  exam¬ 
ple,  the  potential  of  the  Fibonacci  heap  shown  in  Figure  19.2  is  5  +  2-  3  =  11.  The 
potential  of  a  set  of  Fibonacci  heaps  is  the  sum  of  the  potentials  of  its  constituent 
Fibonacci  heaps.  We  shall  assume  that  a  unit  of  potential  can  pay  for  a  constant 
amount  of  work,  where  the  constant  is  sufficiently  large  to  cover  the  cost  of  any  of 
the  specific  constant-time  pieces  of  work  that  we  might  encounter. 

We  assume  that  a  Fibonacci  heap  application  begins  with  no  heaps.  The  initial 
potential,  therefore,  is  0,  and  by  equation  (19.1),  the  potential  is  nonnegative  at 
all  subsequent  times.  From  equation  (17.3),  an  upper  bound  on  the  total  amortized 
cost  provides  an  upper  bound  on  the  total  actual  cost  for  the  sequence  of  operations. 

Maximum  degree 

The  amortized  analyses  we  shall  perform  in  the  remaining  sections  of  this  chapter 
assume  that  we  know  an  upper  bound  D(n )  on  the  maximum  degree  of  any  node 
in  an  //-node  Fibonacci  heap.  We  won’t  prove  it,  but  when  only  the  mergeable- 
heap  operations  are  supported,  D(n)  <  [lg»J-  (Problem  19-2(d)  asks  you  to  prove 
this  property.)  In  Sections  19.3  and  19.4,  we  shall  show  that  when  we  support 
Decrease-Key  and  Delete  as  well,  D(n)  =  0(\gn). 
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19.2  Mergeable-heap  operations 

The  mergeable-heap  operations  on  Fibonacci  heaps  delay  work  as  long  as  possible. 
The  various  operations  have  performance  trade-offs.  For  example,  we  insert  a  node 
by  adding  it  to  the  root  list,  which  takes  just  constant  time.  If  we  were  to  start 
with  an  empty  Fibonacci  heap  and  then  insert  k  nodes,  the  Fibonacci  heap  would 
consist  of  just  a  root  list  of  k  nodes.  The  trade-off  is  that  if  we  then  perform 
an  Extract-Min  operation  on  Fibonacci  heap  H ,  after  removing  the  node  that 
H.min  points  to,  we  would  have  to  look  through  each  of  the  remaining  k  —  l  nodes 
in  the  root  list  to  find  the  new  minimum  node.  As  long  as  we  have  to  go  through 
the  entire  root  list  during  the  Extract-Min  operation,  we  also  consolidate  nodes 
into  min-heap-ordered  trees  to  reduce  the  size  of  the  root  list.  We  shall  see  that,  no 
matter  what  the  root  list  looks  like  before  a  Extract-Min  operation,  afterward 
each  node  in  the  root  list  has  a  degree  that  is  unique  within  the  root  list,  which  leads 
to  a  root  list  of  size  at  most  D(n)  +  1 . 

Creating  a  new  Fibonacci  heap 

To  make  an  empty  Fibonacci  heap,  the  Make-Fib -Heap  procedure  allocates  and 
returns  the  Fibonacci  heap  object  H,  where  H.n  =  0  and  H.min  =  NIL;  there 
are  no  trees  in  H .  Because  t(H)  =  0  and  m(H)  =  0,  the  potential  of  the  empty 
Fibonacci  heap  is  <!>(//)  =  0.  The  amortized  cost  of  Make-Fib-Heap  is  thus 
equal  to  its  0(  1)  actual  cost. 

Inserting  a  node 

The  following  procedure  inserts  node  x  into  Fibonacci  heap  H ,  assuming  that  the 
node  has  already  been  allocated  and  that  x.key  has  already  been  filled  in. 

Fib-Heap-Insert(//,  x) 

1  x.  degree  =  0 

2  x.p  =  NIL 

3  x.  child  =  NIL 

4  x.mark  =  FALSE 

5  if  H.min  ==  NIL 

6  create  a  root  list  for  H  containing  just  x 

7  H.min  =  x 

8  else  insert  x  into  H  ’s  root  list 

9  if  x.key  <  H.min. key 

10  H.min  =  x 

11  H.n  =  H.n  +  1 
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Figure  19.3  Inserting  a  node  into  a  Fibonacci  heap,  (a)  A  Fibonacci  heap  H .  (b)  Fibonacci  heap  H 
after  inserting  the  node  with  key  21.  The  node  becomes  its  own  min  heap  ordered  tree  and  is  then 
added  to  the  root  list,  becoming  the  left  sibling  of  the  root. 

Lines  1—4  initialize  some  of  the  structural  attributes  of  node  x.  Line  5  tests  to  see 
whether  Fibonacci  heap  H  is  empty.  If  it  is,  then  lines  6-7  make  x  be  the  only 
node  in  H' s  root  list  and  set  H.min  to  point  to  x.  Otherwise,  lines  8-10  insert  x 
into  H' s  root  list  and  update  H.min  if  necessary.  Finally,  line  1 1  increments  H.n 
to  reflect  the  addition  of  the  new  node.  Figure  19.3  shows  a  node  with  key  21 
inserted  into  the  Fibonacci  heap  of  Figure  19.2. 

To  determine  the  amortized  cost  of  Fib-Heap-Insert,  let  H  be  the  input  Fi¬ 
bonacci  heap  and  H'  be  the  resulting  Fibonacci  heap.  Then,  t(H')  =  t(H)  +  1 
and  m(H')  =  m(H),  and  the  increase  in  potential  is 

((!(//)  +  1)  +  2 m(H))  -  (t(H)  +  2 m(H))  =  1  . 

Since  the  actual  cost  is  0(1),  the  amortized  cost  is  0(1)  -I-  1  =  0(1). 

Finding  the  minimum  node 

The  minimum  node  of  a  Fibonacci  heap  H  is  given  by  the  pointer  H.min,  so  we 
can  find  the  minimum  node  in  0(1)  actual  time.  Because  the  potential  of  H  does 
not  change,  the  amortized  cost  of  this  operation  is  equal  to  its  0(1)  actual  cost 

Uniting  two  Fibonacci  heaps 

The  following  procedure  unites  Fibonacci  heaps  Hx  and  H2,  destroying  Hx  and  H2 
in  the  process.  It  simply  concatenates  the  root  lists  of  Hx  and  H2  and  then  deter¬ 
mines  the  new  minimum  node.  Afterward,  the  objects  representing  H\  and  H2  will 
never  be  used  again. 
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Fib -Heap-Union (Hi,  H2) 

1  h  =  Make-Fib -Heap  () 

2  H.min  =  Hi.  min 

3  concatenate  the  root  list  of  H2  with  the  root  list  of  H 

4  if  {Hi  .min  ==  nil)  or  ( H2.min  ^  NIL  and  H2. min. key  <  Hi. min. key) 

5  H.min  =  H2.min 

6  H.n  =  H\.n  +  H2.n 

7  return  H 

Lines  1-3  concatenate  the  root  lists  of  Hi  and  H2  into  a  new  root  list  H .  Lines 
2,  4,  and  5  set  the  minimum  node  of  H ,  and  line  6  sets  H.n  to  the  total  number 
of  nodes.  Line  7  returns  the  resulting  Fibonacci  heap  H .  As  in  the  Fib-Heap- 
Insert  procedure,  all  roots  remain  roots. 

The  change  in  potential  is 

$(H)  -  mHi)  +  <3>(H2)) 

=  ( t(H )  +  2 m(H))  -  ((t(Hi)  +  2m(Hi))  +  (t(H2)  +  2 m(H2))) 

=  0, 

because  t(H)  =  t(Hi)  +  t(H2)  and  m(H)  =  m{Hi)  +  m(H2).  The  amortized 
cost  of  Fib -Heap-Union  is  therefore  equal  to  its  0(1)  actual  cost. 

Extracting  the  minimum  node 

The  process  of  extracting  the  minimum  node  is  the  most  complicated  of  the  oper¬ 
ations  presented  in  this  section.  It  is  also  where  the  delayed  work  of  consolidating 
trees  in  the  root  list  finally  occurs.  The  following  pseudocode  extracts  the  mini¬ 
mum  node.  The  code  assumes  for  convenience  that  when  a  node  is  removed  from 
a  linked  list,  pointers  remaining  in  the  list  are  updated,  but  pointers  in  the  extracted 
node  are  left  unchanged.  It  also  calls  the  auxiliary  procedure  CONSOLIDATE, 
which  we  shall  see  shortly. 
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Fib-Heap-Extract-Min  (//) 

1  z  =  H.min 

2  if  ;  /  NIL 

3  for  each  child  x  of  z 

4  add  x  to  the  root  list  of  H 

5  X.p  —  NIL 

6  remove  z  from  the  root  list  of  H 

7  if  s  ==  z. right 

8  H.min  =  NIL 

9  else  H.min  =  z. right 

10  Consolidate!//) 

11  H.n  =  H.n  —  1 

12  return  z 

As  Figure  19.4  illustrates,  Fib-Heap-Extract-Min  works  by  first  making  a  root 
out  of  each  of  the  minimum  node’s  children  and  removing  the  minimum  node  from 
the  root  list.  It  then  consolidates  the  root  list  by  linking  roots  of  equal  degree  until 
at  most  one  root  remains  of  each  degree. 

We  start  in  line  1  by  saving  a  pointer  z  to  the  minimum  node;  the  procedure 
returns  this  pointer  at  the  end.  If  z  is  NIL,  then  Fibonacci  heap  H  is  already  empty 
and  we  are  done.  Otherwise,  we  delete  node  z  from  H  by  making  all  of  z' s  chil¬ 
dren  roots  of  H  in  lines  3-5  (putting  them  into  the  root  list)  and  removing  z  from 
the  root  list  in  line  6.  If  z  is  its  own  right  sibling  after  line  6,  then  z  was  the 
only  node  on  the  root  list  and  it  had  no  children,  so  all  that  remains  is  to  make 
the  Fibonacci  heap  empty  in  line  8  before  returning  z.  Otherwise,  we  set  the 
pointer  H.min  into  the  root  list  to  point  to  a  root  other  than  z  (in  this  case,  z’s 
right  sibling),  which  is  not  necessarily  going  to  be  the  new  minimum  node  when 
Fib-Heap-Extract-Min  is  done.  Figure  19.4(b)  shows  the  Fibonacci  heap  of 
Figure  19.4(a)  after  executing  line  9. 

The  next  step,  in  which  we  reduce  the  number  of  trees  in  the  Fibonacci  heap,  is 
consolidating  the  root  list  of  H ,  which  the  call  CONSOLIDATE)//)  accomplishes. 
Consolidating  the  root  list  consists  of  repeatedly  executing  the  following  steps  until 
every  root  in  the  root  list  has  a  distinct  degree  value: 

1.  Find  two  roots  x  and  y  in  the  root  list  with  the  same  degree.  Without  loss  of 
generality,  let  x.key  <  y .key. 

2.  Link  y  to  x:  remove  y  from  the  root  list,  and  make  y  a  child  of  x  by  calling  the 
Fib -Heap-Fink  procedure.  This  procedure  increments  the  attribute  x. degree 
and  deal's  the  mark  on  y. 


514 


Chapter  19  Fibonacci  Heaps 


H.min  H.min 

|  I 


Figure  19.4  The  action  of  Fib  Heap  Extract  Min.  (a)  A  Fibonacci  heap  H .  (b)  The  situa 
tion  after  removing  the  minimum  node  z  from  the  root  list  and  adding  its  children  to  the  root  list, 
(c)  (e)  The  array  A  and  the  trees  after  each  of  the  first  three  iterations  of  the  for  loop  of  lines  4  14  of 
the  procedure  CONSOLIDATE.  The  procedure  processes  the  root  list  by  starting  at  the  node  pointed 
to  by  H.min  and  following  right  pointers.  Each  part  shows  the  values  of  w  and  x  at  the  end  of  an 
iteration,  (f)  (h)  The  next  iteration  of  the  for  loop,  with  the  values  of  w  and  x  shown  at  the  end  of 
each  iteration  of  the  while  loop  of  lines  7  13.  Part  (f)  shows  the  situation  after  the  first  time  through 
the  while  loop.  The  node  with  key  23  has  been  linked  to  the  node  with  key  7,  which  x  now  points  to. 
In  part  (g),  the  node  with  key  17  has  been  linked  to  the  node  with  key  7,  which  x  still  points  to.  In 
part  (h),  the  node  with  key  24  has  been  linked  to  the  node  with  key  7.  Since  no  node  was  previously 
pointed  to  by  4[3],  at  the  end  of  the  for  loop  iteration,  A[3]  is  set  to  point  to  the  root  of  the  resulting 
tree. 
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Figure  19.4,  continued  (i)  (I)  The  situation  after  each  of  the  next  four  iterations  of  the  for  loop, 
(m)  Fibonacci  heap  H  after  reconstructing  the  root  list  from  the  array  A  and  determining  the  new 
H .min  pointer. 


The  procedure  CONSOLIDATE  uses  an  auxiliary  array  ^4  [0 . .  D(H.n)\  to  keep 
track  of  roots  according  to  their  degrees.  If  A[i]  —  y,  then  y  is  currently  a  root 
with  y.  degree  =  i.  Of  course,  in  order  to  allocate  the  array  we  have  to  know  how 
to  calculate  the  upper  bound  D(H.n)  on  the  maximum  degree,  but  we  will  see  how 
to  do  so  in  Section  19.4. 
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Consolidate  (H) 

1  let  A[0 . .  D(H.n)\  be  a  new  array 

2  for  i  =  0  to  D(H.n) 

3  A[i]  =  nil 

4  for  each  node  w  in  the  root  list  of  H 

5  x  =  w 

6  d  =  x.  degree 

7  while  A[d]  /  nil 

8  y  =  A[c/]  //  another  node  with  the  same  degree  as  x 

9  if  x.key  >  y .key 

10  exchange  x  with  y 

11  Fib -Heap-Link  (//,  y,x) 

12  A[d]  =  NIL 

13  d  —  d  1 

14  A[r/]  =  x 

15  H.min  =  NIL 

16  fori  =  0  to  D(H.ri) 

17  if  A  [/  ]  ^2  NIL 

18  if  H.min  ==  NIL 

19  create  a  root  list  for  H  containing  just  A[i } 

20  H.min  =  4  [/] 

21  else  insert  4[/]  into  H' s  root  list 

22  if  ,4 [/ ] . <  H.min. key 

23  H.min  =  4  [/] 

Fib -Heap-Link  (//,  y,  x) 

1  remove  y  from  the  root  list  of  H 

2  make  y  a  child  of  x,  incrementing  x. degree 

3  y  .mark  =  FALSE 

In  detail,  the  CONSOLIDATE  procedure  works  as  follows.  Lines  1-3  allocate 
and  initialize  the  array  A  by  making  each  entry  NIL.  The  for  loop  of  lines  4-14 
processes  each  root  w  in  the  root  list.  As  we  link  roots  together,  w  may  be  linked 
to  some  other  node  and  no  longer  be  a  root.  Nevertheless,  w  is  always  in  a  tree 
rooted  at  some  node  x,  which  may  or  may  not  be  w  itself.  Because  we  want  at 
most  one  root  with  each  degree,  we  look  in  the  array  A  to  see  whether  it  contains 
a  root  y  with  the  same  degree  as  x.  If  it  does,  then  we  link  the  roots  x  and  y  but 
guaranteeing  that  x  remains  a  root  after  linking.  That  is,  we  link  y  to  x  after  first 
exchanging  the  pointers  to  the  two  roots  if  y’s  key  is  smaller  than  x’s  key.  After 
we  link  y  to  x,  the  degree  of  x  has  increased  by  1,  and  so  we  continue  this  process, 
linking  x  and  another  root  whose  degree  equals  x’s  new  degree,  until  no  other  root 
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that  we  have  processed  has  the  same  degree  as  x.  We  then  set  the  appropriate  entry 
of  A  to  point  to  x,  so  that  as  we  process  roots  later  on,  we  have  recorded  that  x  is 
the  unique  root  of  its  degree  that  we  have  already  processed.  When  this  for  loop 
terminates,  at  most  one  root  of  each  degree  will  remain,  and  the  array  A  will  point 
to  each  remaining  root. 

The  while  loop  of  lines  7-13  repeatedly  links  the  root  x  of  the  tree  containing 
node  w  to  another  tree  whose  root  has  the  same  degree  as  x,  until  no  other  root  has 
the  same  degree.  This  while  loop  maintains  the  following  invariant: 

At  the  start  of  each  iteration  of  the  while  loop,  d  =  x. degree. 

We  use  this  loop  invariant  as  follows: 

Initialization:  Line  6  ensures  that  the  loop  invariant  holds  the  first  time  we  enter 
the  loop. 

Maintenance:  In  each  iteration  of  the  while  loop,  A[d]  points  to  some  root  y. 
Because  d  —  x. degree  =  y .degree,  we  want  to  link  x  and  y.  Whichever  of 
x  and  y  has  the  smaller  key  becomes  the  parent  of  the  other  as  a  result  of  the 
link  operation,  and  so  lines  9-10  exchange  the  pointers  to  x  and  y  if  necessary. 
Next,  we  link  y  to  x  by  the  call  Fib-Heap-Link  (//,  y,  x)  in  line  11.  This 
call  increments  x. degree  but  leaves  y .degree  as  d.  Node  y  is  no  longer  a  root, 
and  so  line  12  removes  the  pointer  to  it  in  array  A.  Because  the  call  of  Fib- 
Heap-Link  increments  the  value  of  x. degree,  line  13  restores  the  invariant 
that  d  =  x. degree. 

Termination:  We  repeat  the  while  loop  until  A[d]  =  NIL,  in  which  case  there  is 
no  other  root  with  the  same  degree  as  x. 

After  the  while  loop  terminates,  we  set  A[d]  to  x  in  line  14  and  perform  the  next 
iteration  of  the  for  loop. 

Figures  19.4(c)-(e)  show  the  array  A  and  the  resulting  trees  after  the  first  three 
iterations  of  the  for  loop  of  lines  4-14.  In  the  next  iteration  of  the  for  loop,  three 
links  occur;  their  results  are  shown  in  Figures  19.4(f)— (h).  Figures  19.4(i)-(l)  show 
the  result  of  the  next  four  iterations  of  the  for  loop. 

All  that  remains  is  to  clean  up.  Once  the  for  loop  of  lines  4-14  completes, 
line  15  empties  the  root  list,  and  lines  16-23  reconstruct  it  from  the  array  A.  The 
resulting  Fibonacci  heap  appeal's  in  Figure  19.4(m).  After  consolidating  the  root 
list,  Fib-Heap-Extract-Min  finishes  up  by  decrementing  H.n  in  line  11  and 
returning  a  pointer  to  the  deleted  node  z  in  line  12. 

We  are  now  ready  to  show  that  the  amortized  cost  of  extracting  the  minimum 
node  of  an  //-node  Fibonacci  heap  is  0(D(n)).  Let  H  denote  the  Fibonacci  heap 
just  prior  to  the  Fib-Heap-Extract-Min  operation. 

We  start  by  accounting  for  the  actual  cost  of  extracting  the  minimum  node. 
An  0(D(n))  contribution  comes  from  Fib-Heap-Extract-Min  processing  at 
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most  D(n)  children  of  the  minimum  node  and  from  the  work  in  lines  2-3  and 
16-23  of  Consolidate.  It  remains  to  analyze  the  contribution  from  the  for  loop 
of  lines  4-14  in  CONSOLIDATE,  for  which  we  use  an  aggregate  analysis.  The  size 
of  the  root  list  upon  calling  CONSOLIDATE  is  at  most  D(n)  +  t(H)  —  1,  since  it 
consists  of  the  original  t(H)  root-list  nodes,  minus  the  extracted  root  node,  plus 
the  children  of  the  extracted  node,  which  number  at  most  D(n).  Within  a  given 
iteration  of  the  for  loop  of  lines  4-14,  the  number  of  iterations  of  the  while  loop  of 
lines  7-13  depends  on  the  root  list.  But  we  know  that  every  time  through  the  while 
loop,  one  of  the  roots  is  linked  to  another,  and  thus  the  total  number  of  iterations 
of  the  while  loop  over  all  iterations  of  the  for  loop  is  at  most  the  number  of  roots 
in  the  root  list.  Hence,  the  total  amount  of  work  performed  in  the  for  loop  is  at 
most  proportional  to  D(n)  +  t(H).  Thus,  the  total  actual  work  in  extracting  the 
minimum  node  is  0(D(n)  +  t(H)). 

The  potential  before  extracting  the  minimum  node  is  t(H)  +  2 m(H),  and  the 
potential  afterward  is  at  most  ( D(n )  +  1)  +  2m(H ),  since  at  most  D(n )  +  1  roots 
remain  and  no  nodes  become  marked  during  the  operation.  The  amortized  cost  is 
thus  at  most 

0(D(n)  +  t(H))  +  ((D(n)  +  1)  +  2 m(H))  -  ( t(H )  +  2 m(H)) 

=  0(D(n))  +  0(t(H))  —  t(H) 

=  0(D(n)), 

since  we  can  scale  up  the  units  of  potential  to  dominate  the  constant  hidden 
in  0(t(H)).  Intuitively,  the  cost  of  performing  each  link  is  paid  for  by  the  re¬ 
duction  in  potential  due  to  the  link’s  reducing  the  number  of  roots  by  one.  We  shall 
see  in  Section  19.4  that  D(n)  =  0(lgn),  so  that  the  amortized  cost  of  extracting 
the  minimum  node  is  0(lg  n). 

Exercises 


19.2-1 

Show  the  Fibonacci  heap  that  results  from  calling  Fib-Heap-Extract-Min  on 
the  Fibonacci  heap  shown  in  Figure  19.4(m). 


19.3  Decreasing  a  key  and  deleting  a  node 

In  this  section,  we  show  how  to  decrease  the  key  of  a  node  in  a  Fibonacci  heap 
in  0(1)  amortized  time  and  how  to  delete  any  node  from  an  72 -node  Fibonacci 
heap  in  0(D(n ))  amortized  time.  In  Section  19.4,  we  will  show  that  the  maxi- 
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mum  degree  D(n)  is  0(lgn),  which  will  imply  that  Fib-Heap-Extract-Min 
and  Fib-Heap-Delete  run  in  O(lgn)  amortized  time. 

Decreasing  a  key 

In  the  following  pseudocode  for  the  operation  Fib -Heap-Decrease-Key,  we 
assume  as  before  that  removing  a  node  from  a  linked  list  does  not  change  any  of 
the  structural  attributes  in  the  removed  node. 

Fib-Heap-Decrease-Key  (H.  x,  k) 

1  if  A:  >  x. key 

2  error  “new  key  is  greater  than  current  key” 

3  x.key  =  k 

4  y  —  x.p 

5  if  v  /  NIL  and  x .  key  <  y .  key 

6  Cut  (H,x,y) 

7  Cascading-Cut  (H,  y) 

8  if  x.key  <  H. min. key 

9  H.min  =  x 

CUT(//,x,y) 

1  remove  x  from  the  child  list  of  y,  decrementing  y .degree 

2  add  x  to  the  root  list  of  H 

3  x.p  =  NIL 

4  x.mark  =  FALSE 

Cascading-Cut(//,  y) 

1  z  =  y.p 

2  if  NIL 

3  if  y .mark  --  FALSE 

4  y.mark  —  TRUE 

5  else  Cut(//,  y,z) 

6  Cascading-Cut  (H,z) 

The  Fib -Heap-Decrease-Key  procedure  works  as  follows.  Fines  1-3  ensure 
that  the  new  key  is  no  greater  than  the  current  key  of  x  and  then  assign  the  new  key 
to  x.  If  x  is  a  root  or  if  x.key  >  y.key,  where  y  is  x’s  parent,  then  no  structural 
changes  need  occur,  since  min-heap  order  has  not  been  violated.  Fines  4-5  test  for 
this  condition. 

If  min-heap  order  has  been  violated,  many  changes  may  occur.  We  start  by 
cutting  x  in  line  6.  The  Cut  procedure  “cuts”  the  link  between  x  and  its  parent  y, 
making  x  a  root. 
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We  use  the  mark  attributes  to  obtain  the  desired  time  bounds.  They  record  a  little 
piece  of  the  history  of  each  node.  Suppose  that  the  following  events  have  happened 
to  node  x : 

1 .  at  some  time,  x  was  a  root, 

2.  then  x  was  linked  to  (made  the  child  of)  another  node, 

3.  then  two  children  of  x  were  removed  by  cuts. 

As  soon  as  the  second  child  has  been  lost,  we  cut  x  from  its  parent,  making  it  a  new 
root.  The  attribute  x.mark  is  TRUE  if  steps  1  and  2  have  occurred  and  one  child 
of  x  has  been  cut.  The  CUT  procedure,  therefore,  clears  x.mark  in  line  4,  since  it 
performs  step  1.  (We  can  now  see  why  line  3  of  Fib -Heap-Link  clears  y .mark-. 
node  y  is  being  linked  to  another  node,  and  so  step  2  is  being  performed.  The  next 
time  a  child  of  y  is  cut,  y .mark  will  be  set  to  TRUE.) 

We  are  not  yet  done,  because  x  might  be  the  second  child  cut  from  its  parent  y 
since  the  time  that  y  was  linked  to  another  node.  Therefore,  line  7  of  Fib-Heap- 
Decrease-Key  attempts  to  perform  a  cascading-cut  operation  on  y.  If  y  is  a 
root,  then  the  test  in  line  2  of  Cascading-Cut  causes  the  procedure  to  just  return. 
If  y  is  unmarked,  the  procedure  marks  it  in  line  4,  since  its  first  child  has  just  been 
cut,  and  returns.  If  y  is  marked,  however,  it  has  just  lost  its  second  child;  y  is  cut 
in  line  5,  and  Cascading-Cut  calls  itself  recursively  in  line  6  on  y’s  parent  z. 
The  Cascading-Cut  procedure  recurses  its  way  up  the  tree  until  it  finds  either  a 
root  or  an  unmarked  node. 

Once  all  the  cascading  cuts  have  occurred,  lines  8-9  of  Fib-Heap-Decrease- 
Key  finish  up  by  updating  H.min  if  necessary.  The  only  node  whose  key  changed 
was  the  node  x  whose  key  decreased.  Thus,  the  new  minimum  node  is  either  the 
original  minimum  node  or  node  x. 

Figure  19.5  shows  the  execution  of  two  calls  of  Fib-Heap-Decrease-Key, 
starting  with  the  Fibonacci  heap  shown  in  Figure  19.5(a).  The  first  call,  shown 
in  Figure  19.5(b),  involves  no  cascading  cuts.  The  second  call,  shown  in  Fig¬ 
ures  19.5(c) — (e),  invokes  two  cascading  cuts. 

We  shall  now  show  that  the  amortized  cost  of  Fib-Heap-Decrease-Key  is 
only  0(1).  We  start  by  determining  its  actual  cost.  The  Fib-Heap-Decrease- 
Key  procedure  takes  0(1)  time,  plus  the  time  to  perform  the  cascading  cuts.  Sup¬ 
pose  that  a  given  invocation  of  Fib -Heap-Decrease-Key  results  in  c  calls  of 
Cascading-Cut  (the  call  made  from  line  7  of  Fib -Heap-Decrease-Key  fol¬ 
lowed  by  c  —  1  recursive  calls  of  Cascading-Cut).  Each  call  of  Cascading- 
Cut  takes  0(1)  time  exclusive  of  recursive  calls.  Thus,  the  actual  cost  of  Fib- 
Heap-Decrease-Key,  including  all  recursive  calls,  is  0(c). 

We  next  compute  the  change  in  potential.  Let  H  denote  the  Fibonacci  heap  just 
prior  to  the  Fib -Heap-Decrease-Key  operation.  The  call  to  Cut  in  line  6  of 
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H.min  H.min 


I  I 


Figure  19.5  TWo  calls  of  Fib  Heap  Decrease  Key.  (a)  The  initial  Fibonacci  heap,  (b)  The 
node  with  key  46  has  its  key  decreased  to  15.  The  node  becomes  a  root,  and  its  parent  (with  key  24), 
which  had  previously  been  unmarked,  becomes  marked,  (c)  (e)  The  node  with  key  35  has  its  key 
decreased  to  5.  In  part  (c),  the  node,  now  with  key  5,  becomes  a  root.  Its  parent,  with  key  26, 
is  marked,  so  a  cascading  cut  occurs.  The  node  with  key  26  is  cut  from  its  parent  and  made  an 
unmarked  root  in  (d).  Another  cascading  cut  occurs,  since  the  node  with  key  24  is  marked  as  well. 
This  node  is  cut  from  its  parent  and  made  an  unmarked  root  in  part  (e).  The  cascading  cuts  stop 
at  this  point,  since  the  node  with  key  7  is  a  root.  (Even  if  this  node  were  not  a  root,  the  cascading 
cuts  would  stop,  since  it  is  unmarked.)  Part  (e)  shows  the  result  of  the  Fib  Heap  Decrease  Key 
operation,  with  H.min  pointing  to  the  new  minimum  node. 

FlB-HEAP-DECREASE-KEY  creates  a  new  tree  rooted  at  node  x  and  clears  x’s 
mark  bit  (which  may  have  already  been  FALSE).  Each  call  of  CASCADING-CUT, 
except  for  the  last  one,  cuts  a  marked  node  and  clears  the  mark  bit.  Afterward,  the 
Fibonacci  heap  contains  t(H)+c  trees  (the  original  t(H)  trees,  c—  1  trees  produced 
by  cascading  cuts,  and  the  tree  rooted  at  x)  and  at  most  m(H)— c+2  marked  nodes 
(c  —  1  were  unmarked  by  cascading  cuts  and  the  last  call  of  Cascading-Cut  may 
have  marked  a  node).  The  change  in  potential  is  therefore  at  most 


mu)  +C)  +  2  (m(H)  -c  +  2))-  (t(H)  +  2  m(H))  =  4  -  c  . 
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Thus,  the  amortized  cost  of  Fib-Heap-Decrease-Key  is  at  most 
0(c)  +  4-c  =  0(1)  , 

since  we  can  scale  up  the  units  of  potential  to  dominate  the  constant  hidden  in  0(c). 

You  can  now  see  why  we  defined  the  potential  function  to  include  a  term  that  is 
twice  the  number  of  marked  nodes.  When  a  marked  node  y  is  cut  by  a  cascading 
cut,  its  mark  bit  is  cleared,  which  reduces  the  potential  by  2.  One  unit  of  potential 
pays  for  the  cut  and  the  clearing  of  the  mark  bit,  and  the  other  unit  compensates 
for  the  unit  increase  in  potential  due  to  node  y  becoming  a  root. 

Deleting  a  node 

The  following  pseudocode  deletes  a  node  from  an  n-node  Fibonacci  heap  in 
0(D(n))  amortized  time.  We  assume  that  there  is  no  key  value  of  — oo  currently 
in  the  Fibonacci  heap. 

Fib  -Heap-Delete  (H,x) 

1  Fib -Heap-Decrease-Key  (//,  x,-oo) 

2  Fib-Heap-Extract-Min  (//) 

Fib -Heap-Delete  makes  x  become  the  minimum  node  in  the  Fibonacci  heap  by 
giving  it  a  uniquely  small  key  of  — oo.  The  Fib-Heap-Extract-Min  procedure 
then  removes  node  x  from  the  Fibonacci  heap.  The  amortized  time  of  Fib -Heap- 
Delete  is  the  sum  of  the  0(1)  amortized  time  of  Fib-Heap-Decrease-Key 
and  the  0(D(n))  amortized  time  of  Fib-Heap-Extract-Min.  Since  we  shall  see 
in  Section  19.4  that  D(n)  =  0(lg«),  the  amortized  time  of  Fib -Heap-Delete 
is  0(lg»)- 

Exercises 


19.3-1 

Suppose  that  a  root  x  in  a  Fibonacci  heap  is  marked.  Explain  how  x  came  to  be 
a  marked  root.  Argue  that  it  doesn’t  matter  to  the  analysis  that  x  is  marked,  even 
though  it  is  not  a  root  that  was  first  linked  to  another  node  and  then  lost  one  child. 


19.3-2 

Justify  the  0(1)  amortized  time  of  Fib -Heap-Decrease-Key  as  an  average  cost 
per  operation  by  using  aggregate  analysis. 
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To  prove  that  the  amortized  time  of  Fib-Heap-Extract-Min  and  Fib -Heap- 
Delete  is  0(lgn),  we  must  show  that  the  upper  bound  D(n)  on  the  degree  of 
any  node  of  an  /7-node  Fibonacci  heap  is  0( lg  n).  In  particular,  we  shall  show  that 
D(n)  <  [l°g0  n  J ,  where  </>  is  the  golden  ratio,  defined  in  equation  (3.24)  as 

f  =  (1  +  V5)/2  =  1.61803...  . 

The  key  to  the  analysis  is  as  follows.  For  each  node  x  within  a  Fibonacci  heap, 
define  size(x)  to  be  the  number  of  nodes,  including  x  itself,  in  the  subtree  rooted 
at  x.  (Note  that  x  need  not  be  in  the  root  list— it  can  be  any  node  at  all.)  We  shall 
show  that  size(x)  is  exponential  in  x. degree.  Bear  in  mind  that  x. degree  is  always 
maintained  as  an  accurate  count  of  the  degree  of  x. 

Lemma  19.1 

Fet  x  be  any  node  in  a  Fibonacci  heap,  and  suppose  that  x. degree  —  k.  Fet 
y\,yi,  •  •  •  ,yk  denote  the  children  of  x  in  the  order  in  which  they  were  linked  to  x, 
from  the  earliest  to  the  latest.  Then,  yi. degree  >  0  and  y,- . degree  >  i  —  2  for 
i  =2,3 ,k. 


Proof  Obviously,  y\. degree  >  0. 

For  i  >  2,  we  note  that  when  y;  was  linked  to  x,  all  of  yi,  j>2»  •  •  • » yi- i  were 
children  of  x,  and  so  we  must  have  had  x.  degree  >  i  —  1.  Because  node  y,  is 
linked  to  x  (by  CONSOLIDATE)  only  if  x.  degree  —  y, . degree,  we  must  have  also 
had  yj. degree  >  i  —  1  at  that  time.  Since  then,  node  y,  has  lost  at  most  one 
child,  since  it  would  have  been  cut  from  x  (by  Cascading-Cut)  if  it  had  lost 
two  children.  We  conclude  that  yt .  degree  >  i  —  2.  ■ 


We  finally  come  to  the  part  of  the  analysis  that  explains  the  name  “Fibonacci 
heaps.”  Recall  from  Section  3.2  that  for  k  =  0, 1, 2, . . .,  the  Ath  Fibonacci  number 
is  defined  by  the  recurrence 


Fk 


0  if  k  =  0  , 

1  if  ifc  =  1  , 

Fk- i  +  Fk- 2  if  k  >  2  . 


The  following  lemma  gives  another  way  to  express  Fg . 
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Lemma  19.2 

For  all  integers  k  >  0, 

k 

Fk+2  =  \+J2F>  ■ 

1=0 

Proof  The  proof  is  by  induction  on  k.  When  k  —  0, 
o 

i  +  J2Fi  =  1  +  jF° 

i= 0 

=  1+0 

=  f2. 

We  now  assume  the  inductive  hypothesis  that  Fk+1  =  1  +  E?=0  Fi>  and  we 
have 

Fk+2  =  +  ^fc+1 


=  i  +  EF<- 


i=0 


Lemma  19.3 

For  all  integers  k  >  0,  the  (/:  +  2)nd  Fibonacci  number  satisfies  Fk+ 2  >  4>k  ■ 

Proof  The  proof  is  by  induction  on  k.  The  base  cases  are  for  k  =  0  and  k  =  1. 
When  k  =  0  we  have  F2  —  1  =  f0,  and  when  k  =  1  we  have  F2  =  2  > 
1.619  >  </> 1 .  The  inductive  step  is  for  k  >  2,  and  we  assume  that  Fi+1  >  ft  for 
i  =  0, 1, . . . ,  k— 1.  Recall  that  f  is  the  positive  root  of  equation  (3.23),  x2  —  x+  1. 
Thus,  we  have 


Fk+ i  +  Fk 

/-1  +  cpk~2 

(by  the  inductive  hypothesis) 

+  1) 

fk~2  ■  <P2 

(by  equation  (3.23)) 

<t>k  ■ 

■ 

The  following  lemma  and  its  corollary  complete  the  analysis. 
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Lemma  19.4 

Let  x  be  any  node  in  a  Fibonacci  heap,  and  let  k  =  x. degree.  Then  size(x)  > 
Fk+ 2  >  <pk,  where  </>  =  (1  +  V5)/2. 

Proof  Let  sg  denote  the  minimum  possible  size  of  any  node  of  degree  k  in  any 
Fibonacci  heap.  Trivially,  So  =  1  and  .v,  =  2.  The  number  sg  is  at  most  size(x) 
and,  because  adding  children  to  a  node  cannot  decrease  the  node’s  size,  the  value 
of  Sg  increases  monotonically  with  k.  Consider  some  node  z.,  in  any  Fibonacci 
heap,  such  that  z. degree  =  k  and  size(z)  =  Sg.  Because  sg  <  size(x),  we 
compute  a  lower  bound  on  size(x)  by  computing  a  lower  bound  on  sg.  As  in 
Lemma  19.1,  let  y1;  y2,  . . . ,  yg  denote  the  children  of  z  in  the  order  in  which  they 
were  linked  to  z.  To  bound  sg,  we  count  one  for  z  itself  and  one  for  the  first  child  y , 
(for  which  size(  v’i )  >  1),  giving 

size(x)  >  sg 


k 


i  =2 
k 


~  2 + ^-2 . 


1=2 


where  the  last  line  follows  from  Lemma  19.1  (so  that  y, .  degree  >7—2)  and  the 
monotonicity  of  Sg  (so  that  syi.degree  > 

We  now  show  by  induction  on  k  that  sg  >  Fg+2  for  all  nonnegative  integers  k. 
The  bases,  for  k  =  0  and  k  =  1,  are  trivial.  For  the  inductive  step,  we  assume  that 
k  >  2  and  that  s,  >  Fi+2  for  i  =  0,  1 . k  —  1.  We  have 


k 


i  =  2 
k 


>  2  +  E^ 


i =2 
k 


=  i  +  E« 


=  Fg+ 2  (by  Lemma  19.2) 

>  (by  Lemma  19.3)  . 


Thus,  we  have  shown  that  size(x)  >  sg  >  Fg+2  >  cpk . 
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Corollary  19.5 

The  maximum  degree  D(n )  of  any  node  in  an  77 -node  Fibonacci  heap  is  0(\g  n). 

Proof  Let  x  be  any  node  in  an  72 -node  Fibonacci  heap,  and  let  k  =  x.  degree. 
By  Lemma  19.4,  we  have  n  >  size(x)  >  cpk .  Taking  base-0  logarithms  gives 
us  k  <  log^/7.  (In  fact,  because  k  is  an  integer,  k  <  [log^/rj.)  The  maximum 
degree  D(n)  of  any  node  is  thus  0(lg  77).  ■ 

Exercises 


19.4-1 

Professor  Pinocchio  claims  that  the  height  of  an  77 -node  Fibonacci  heap  is  0( lg  72). 
Show  that  the  professor  is  mistaken  by  exhibiting,  for  any  positive  integer  77,  a 
sequence  of  Fibonacci-heap  operations  that  creates  a  Fibonacci  heap  consisting  of 
just  one  tree  that  is  a  linear  chain  of  n  nodes. 


19.4-2 

Suppose  we  generalize  the  cascading-cut  rule  to  cut  a  node  x  from  its  parent  as 
soon  as  it  loses  its  Lth  child,  for  some  integer  constant  k.  (The  rule  in  Section  19.3 
uses  k  =  2.)  For  what  values  of  k  is  D(n)  =  0( lg  72)? 


Problems 


19-1  Alternative  implementation  of  deletion 

Professor  Pisano  has  proposed  the  following  variant  of  the  Fib -Heap-Delete 
procedure,  claiming  that  it  runs  faster  when  the  node  being  deleted  is  not  the  node 
pointed  to  by  H.min. 

Pisano-Delete  (H,  x) 

1  \ix-~  H.min 

2  Fib-Heap-Extract-Min(//) 

3  else  y  =  x.p 

4  if  v  /  NIL 

5  ’  Cut (H,x,y) 

6  Cascading-Cut(//,  y) 

7  add  x’s  child  list  to  the  root  list  of  H 

8  remove  x  from  the  root  list  of  H 


Problems  for  Chapter  19 
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a.  The  professor’s  claim  that  this  procedure  runs  faster  is  based  partly  on  the  as¬ 
sumption  that  line  7  can  be  performed  in  0(1)  actual  time.  What  is  wrong  with 
this  assumption? 

b.  Give  a  good  upper  bound  on  the  actual  time  of  Pisano-Delete  when  x  is 
not  H.min.  Your  bound  should  be  in  terms  of  x.  degree  and  the  number  c  of 
calls  to  the  Cascading-Cut  procedure. 

c.  Suppose  that  we  call  Pisano-Delete (H,  x),  and  let  H'  be  the  Fibonacci  heap 
that  results.  Assuming  that  node  x  is  not  a  root,  bound  the  potential  of  H'  in 
terms  of  x. degree,  c,  t(H),  and  m(H). 

d.  Conclude  that  the  amortized  time  for  Pisano-Delete  is  asymptotically  no 
better  than  for  Fib-Heap-Delete,  even  when  x  ^  H.min. 

19-2  Binomial  trees  and  binomial  heaps 

The  binomial  tree  B/(  is  an  ordered  tree  (see  Section  B.5.2)  defined  recursively. 
As  shown  in  Figure  19.6(a),  the  binomial  tree  B0  consists  of  a  single  node.  The 
binomial  tree  Bk  consists  of  two  binomial  trees  Bk-\  that  are  linked  together  so 
that  the  root  of  one  is  the  leftmost  child  of  the  root  of  the  other.  Figure  19.6(b) 
shows  the  binomial  trees  B0  through  B4. 

a.  Show  that  for  the  binomial  tree  Bk , 

1.  there  are  2k  nodes, 

2.  the  height  of  the  tree  is  k, 

3.  there  are  exactly  (kj)  nodes  at  depth  i  for  i  =  0, 1, . . . , k,  and 

4.  the  root  has  degree  k,  which  is  greater  than  that  of  any  other  node;  moreover, 
as  Figure  19.6(c)  shows,  if  we  number  the  children  of  the  root  from  left  to 
right  by  k  —  \  ,k  —  2, ....  0,  then  child  i  is  the  root  of  a  subtree  B, . 

A  binomial  heap  H  is  a  set  of  binomial  trees  that  satisfies  the  following  proper¬ 
ties: 

1.  Each  node  has  a  key  (like  a  Fibonacci  heap). 

2.  Each  binomial  tree  in  H  obeys  the  min-heap  property. 

3.  For  any  nonnegative  integer  k,  there  is  at  most  one  binomial  tree  in  H  whose 
root  has  degree  k. 

b.  Suppose  that  a  binomial  heap  H  has  a  total  of  n  nodes.  Discuss  the  relationship 
between  the  binomial  trees  that  H  contains  and  the  binary  representation  of  n . 
Conclude  that  H  consists  of  at  most  |_lg»J  +  1  binomial  trees. 
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Figure  19.6  (a)  The  recursive  definition  of  the  binomial  tree  B Triangles  represent  rooted  sub 
trees,  (b)  The  binomial  trees  Bq  through  B4.  Node  depths  in  B 4  are  shown,  (c)  Another  way  of 
looking  at  the  binomial  tree  B ^ . 


Suppose  that  we  represent  a  binomial  heap  as  follows.  The  left-child,  right¬ 
sibling  scheme  of  Section  10.4  represents  each  binomial  tree  within  a  binomial 
heap.  Each  node  contains  its  key;  pointers  to  its  parent,  to  its  leftmost  child,  and 
to  the  sibling  immediately  to  its  right  (these  pointers  are  nil  when  appropriate); 
and  its  degree  (as  in  Fibonacci  heaps,  how  many  children  it  has).  The  roots  form  a 
singly  linked  root  list,  ordered  by  the  degrees  of  the  roots  (from  low  to  high),  and 
we  access  the  binomial  heap  by  a  pointer  to  the  first  node  on  the  root  list. 

c.  Complete  the  description  of  how  to  represent  a  binomial  heap  (i.e.,  name  the 
attributes,  describe  when  attributes  have  the  value  NIL,  and  define  how  the  root 
list  is  organized),  and  show  how  to  implement  the  same  seven  operations  on 
binomial  heaps  as  this  chapter  implemented  on  Fibonacci  heaps.  Each  opera¬ 
tion  should  run  in  <9(lg  n)  worst-case  time,  where  n  is  the  number  of  nodes  in 
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the  binomial  heap  (or  in  the  case  of  the  UNION  operation,  in  the  two  binomial 
heaps  that  are  being  united).  The  Make-Heap  operation  should  take  constant 
time. 

d.  Suppose  that  we  were  to  implement  only  the  mergeable-heap  operations  on  a 
Fibonacci  heap  (i.e.,  we  do  not  implement  the  Decrease-Key  or  Delete  op¬ 
erations).  How  would  the  trees  in  a  Fibonacci  heap  resemble  those  in  a  binomial 
heap?  How  would  they  differ?  Show  that  the  maximum  degree  in  an  //-node 
Fibonacci  heap  would  be  at  most  (_lg  n\ . 

e.  Professor  McGee  has  devised  a  new  data  structure  based  on  Fibonacci  heaps. 
A  McGee  heap  has  the  same  structure  as  a  Fibonacci  heap  and  supports  just 
the  mergeable-heap  operations.  The  implementations  of  the  operations  are  the 
same  as  for  Fibonacci  heaps,  except  that  insertion  and  union  consolidate  the 
root  list  as  their  last  step.  What  are  the  worst-case  running  times  of  operations 
on  McGee  heaps? 

19-3  More  Fibonacci-heap  operations 

We  wish  to  augment  a  Fibonacci  heap  H  to  support  two  new  operations  without 
changing  the  amortized  running  time  of  any  other  Fibonacci-heap  operations. 

a.  The  operation  Fib -Heap-Change-Key  (H,  x,  k)  changes  the  key  of  node  x 
to  the  value  k.  Give  an  efficient  implementation  of  Fib -Heap-Change-Key, 
and  analyze  the  amortized  running  time  of  your  implementation  for  the  cases 
in  which  k  is  greater  than,  less  than,  or  equal  to  x.key. 

b.  Give  an  efficient  implementation  of  Fib -Heap-Prune  (//,  r),  which  deletes 
q  =  min(r,  H.n)  nodes  from  H .  You  may  choose  any  q  nodes  to  delete.  Ana¬ 
lyze  the  amortized  running  time  of  your  implementation.  (Hint:  You  may  need 
to  modify  the  data  structure  and  potential  function.) 

19-4  2-3-4  heaps 

Chapter  18  introduced  the  2-3-4  tree,  in  which  every  internal  node  (other  than  pos¬ 
sibly  the  root)  has  two,  three,  or  four  children  and  all  leaves  have  the  same  depth.  In 
this  problem,  we  shall  implement  2-3-4  heaps,  which  support  the  mergeable-heap 
operations. 

The  2-3-4  heaps  differ  from  2-3-4  trees  in  the  following  ways.  In  2-3-4  heaps, 
only  leaves  store  keys,  and  each  leaf  x  stores  exactly  one  key  in  the  attribute  x.key. 
The  keys  in  the  leaves  may  appear  in  any  order.  Each  internal  node  x  contains 
a  value  x .  small  that  is  equal  to  the  smallest  key  stored  in  any  leaf  in  the  subtree 
rooted  at  x.  The  root  r  contains  an  attribute  r.  height  that  gives  the  height  of  the 
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tree.  Finally,  2-3-4  heaps  are  designed  to  be  kept  in  main  memory,  so  that  disk 
reads  and  writes  are  not  needed. 

Implement  the  following  2-3-4  heap  operations.  In  parts  (a)-(e),  each  operation 
should  run  in  0{ lg  n)  time  on  a  2-3-4  heap  with  n  elements.  The  UNION  operation 
in  part  (f)  should  run  in  O (lg  n)  time,  where  n  is  the  number  of  elements  in  the  two 
input  heaps. 

a.  Minimum,  which  returns  a  pointer  to  the  leaf  with  the  smallest  key. 

b.  Decrease-Key,  which  decreases  the  key  of  a  given  leaf  x  to  a  given  value 
k  <  x.key. 

c.  Insert,  which  inserts  leaf  x  with  key  k. 

d.  Delete,  which  deletes  a  given  leaf  x. 

e.  Extract-Min,  which  extracts  the  leaf  with  the  smallest  key. 

/.  UNION,  which  unites  two  2-3-4  heaps,  returning  a  single  2-3-4  heap  and  de¬ 
stroying  the  input  heaps. 


Chapter  notes 

Fredman  and  Tarjan  [114]  introduced  Fibonacci  heaps.  Their  paper  also  describes 
the  application  of  Fibonacci  heaps  to  the  problems  of  single-source  shortest  paths, 
all-pairs  shortest  paths,  weighted  bipartite  matching,  and  the  minimum-spanning- 
tree  problem. 

Subsequently,  Driscoll,  Gabow,  Shrairman,  and  Tarjan  [96]  developed  “relaxed 
heaps”  as  an  alternative  to  Fibonacci  heaps.  They  devised  two  varieties  of  re¬ 
laxed  heaps.  One  gives  the  same  amortized  time  bounds  as  Fibonacci  heaps.  The 
other  allows  Decrease-Key  to  run  in  0(1)  worst-case  (not  amortized)  time  and 
Extract-Min  and  Delete  to  run  in  0{\gn)  worst-case  time.  Relaxed  heaps 
also  have  some  advantages  over  Fibonacci  heaps  in  parallel  algorithms. 

See  also  the  chapter  notes  for  Chapter  6  for  other  data  structures  that  support  fast 
Decrease-Key  operations  when  the  sequence  of  values  returned  by  Extract- 
Min  calls  are  monotonically  increasing  over  time  and  the  data  are  integers  in  a 
specific  range. 
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In  previous  chapters,  we  saw  data  structures  that  support  the  operations  of  a  priority 
queue— binary  heaps  in  Chapter  6,  red-black  trees  in  Chapter  13, 1  and  Fibonacci 
heaps  in  Chapter  19.  In  each  of  these  data  structures,  at  least  one  important  op¬ 
eration  took  O(lgn)  time,  either  worst  case  or  amortized.  In  fact,  because  each 
of  these  data  structures  bases  its  decisions  on  comparing  keys,  the  Q(n  Ig  n)  lower 
bound  for  sorting  in  Section  8.1  tells  us  that  at  least  one  operation  will  have  to 
take  £2(lg  n)  time.  Why?  If  we  could  perform  both  the  INSERT  and  Extract-Min 
operations  in  o{\gn)  time,  then  we  could  sort  n  keys  in  o(n  Ig n)  time  by  first  per¬ 
forming  n  Insert  operations,  followed  by  n  Extract-Min  operations. 

We  saw  in  Chapter  8,  however,  that  sometimes  we  can  exploit  additional  infor¬ 
mation  about  the  keys  to  sort  in  o(n  Ig  n )  time.  In  particular,  with  counting  sort 
we  can  sort  n  keys,  each  an  integer  in  the  range  0  to  k,  in  time  0(n  +  k),  which 
is  0(n)  when  k  —  0(n). 

Since  we  can  circumvent  the  Q(n  lg  n)  lower  bound  for  sorting  when  the  keys  are 
integers  in  a  bounded  range,  you  might  wonder  whether  we  can  perform  each  of  the 
priority-queue  operations  in  o(lgn)  time  in  a  similar  scenario.  In  this  chapter,  we 
shall  see  that  we  can:  van  Emde  Boas  trees  support  the  priority-queue  operations, 
and  a  few  others,  each  in  O(lglgn)  worst-case  time.  The  hitch  is  that  the  keys 
must  be  integers  in  the  range  0  to  n  —  1 ,  with  no  duplicates  allowed. 

Specifically,  van  Emde  Boas  trees  support  each  of  the  dynamic  set  operations 
listed  on  page  230— Search,  Insert,  Delete,  Minimum,  Maximum,  Suc¬ 
cessor,  and  Predecessor— in  O(lglgn)  time.  In  this  chapter,  we  will  omit 
discussion  of  satellite  data  and  focus  only  on  storing  keys.  Because  we  concentrate 
on  keys  and  disallow  duplicate  keys  to  be  stored,  instead  of  describing  the  Search 


Chapter  13  does  not  explicitly  discuss  how  to  implement  EXTRACT  MlN  and  DECREASE  Key, but 
we  can  easily  build  these  operations  for  any  data  structure  that  supports  Minimum,  Delete,  and 
Insert. 
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operation,  we  will  implement  the  simpler  operation  Member(S\  x),  which  returns 
a  boolean  indicating  whether  the  value  x  is  currently  in  dynamic  set  S. 

So  far,  we  have  used  the  parameter  n  for  two  distinct  purposes:  the  number  of 
elements  in  the  dynamic  set,  and  the  range  of  the  possible  values.  To  avoid  any 
further  confusion,  from  here  on  we  will  use  n  to  denote  the  number  of  elements 
currently  in  the  set  and  u  as  the  range  of  possible  values,  so  that  each  van  Emde 
Boas  tree  operation  runs  in  0(lg lg u)  time.  We  call  the  set  {0, 1, 2 , ,u  —  1} 
the  universe  of  values  that  can  be  stored  and  it  the  universe  size.  We  assume 
throughout  this  chapter  that  u  is  an  exact  power  of  2,  i.e.,  u  =  2k  for  some  integer 
k  >  1. 

Section  20.1  starts  us  out  by  examining  some  simple  approaches  that  will  get 
us  going  in  the  right  direction.  We  enhance  these  approaches  in  Section  20.2, 
introducing  proto  van  Emde  Boas  structures,  which  are  recursive  but  do  not  achieve 
our  goal  of  O ( 1  g  lg  u)-time  operations.  Section  20.3  modifies  proto  van  Emde  Boas 
structures  to  develop  van  Emde  Boas  trees,  and  it  shows  how  to  implement  each 
operation  in  O (lg  lg  u)  time. 


20.1  Preliminary  approaches 

In  this  section,  we  shall  examine  various  approaches  for  storing  a  dynamic  set. 
Although  none  will  achieve  the  0( lg  lg  it)  time  bounds  that  we  desire,  we  will  gain 
insights  that  will  help  us  understand  van  Emde  Boas  trees  when  we  see  them  later 
in  this  chapter. 

Direct  addressing 

Direct  addressing,  as  we  saw  in  Section  11.1,  provides  the  simplest  approach  to 
storing  a  dynamic  set.  Since  in  this  chapter  we  are  concerned  only  with  storing 
keys,  we  can  simplify  the  direct-addressing  approach  to  store  the  dynamic  set  as  a 
bit  vector,  as  discussed  in  Exercise  11.1-2.  To  store  a  dynamic  set  of  values  from 
the  universe  {0, 1, 2, . . . ,  u  —  1},  we  maintain  an  array  A  [0  . .  u  —  1]  of  u  bits.  The 
entry  A[x]  holds  a  1  if  the  value  x  is  in  the  dynamic  set,  and  it  holds  a  0  otherwise. 
Although  we  can  perform  each  of  the  Insert,  Delete,  and  Member  operations 
in  0(1)  time  with  a  bit  vector,  the  remaining  operations— Minimum,  Maximum, 
Successor,  and  Predecessor— each  take  0(u)  time  in  the  worst  case  because 
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Figure  20.1  A  binary  tree  of  bits  superimposed  on  top  of  a  bit  vector  representing  the  set 
{2,  3, 4. 5, 7, 14. 15}  when  u  =  16.  Each  internal  node  contains  a  1  if  and  only  if  some  leaf  in 
its  subtree  contains  a  1.  The  arrows  show  the  path  followed  to  determine  the  predecessor  of  14  in  the 
set. 

vve  might  have  to  scan  through  0(u)  elements.2  For  example,  if  a  set  contains  only 
the  values  0  and  u  —  1,  then  to  find  the  successor  of  0,  we  would  have  to  scan 
entries  1  through  u  —  2  before  finding  a  1  in  A[u  —  1], 

Superimposing  a  binary’  tree  structure 

We  can  short-cut  long  scans  in  the  bit  vector  by  superimposing  a  binary  tree  of  bits 
on  top  of  it.  Figure  20. 1  shows  an  example.  The  entries  of  the  bit  vector  form  the 
leaves  of  the  binary  tree,  and  each  internal  node  contains  a  1  if  and  only  if  any  leaf 
in  its  subtree  contains  a  1.  In  other  words,  the  bit  stored  in  an  internal  node  is  the 
logical-or  of  its  two  children. 

The  operations  that  took  0(u)  worst-case  time  with  an  unadorned  bit  vector  now 
use  the  tree  structure: 

•  To  find  the  minimum  value  in  the  set,  start  at  the  root  and  head  down  toward 
the  leaves,  always  taking  the  leftmost  node  containing  a  1. 

•  To  find  the  maximum  value  in  the  set,  start  at  the  root  and  head  down  toward 
the  leaves,  always  taking  the  rightmost  node  containing  a  1. 


2We  assume  throughout  this  chapter  that  Minimum  and  Maximum  return  nil  if  the  dynamic  set 
is  empty  and  that  SUCCESSOR  and  PREDECESSOR  return  NIL  if  the  element  they  are  given  has  no 
successor  or  predecessor,  respectively. 


534 


Chapter  20  van  Emde  Boas  Trees 


•  To  find  the  successor  of  x,  staid  at  the  leaf  indexed  by  x,  and  head  up  toward  the 
root  until  we  enter  a  node  from  the  left  and  this  node  has  a  1  in  its  right  child  z. 
Then  head  down  through  node  z,  always  taking  the  leftmost  node  containing 
a  1  (i.e.,  find  the  minimum  value  in  the  subtree  rooted  at  the  right  child  z). 

•  To  find  the  predecessor  of  x,  start  at  the  leaf  indexed  by  x,  and  head  up  toward 
the  root  until  we  enter  a  node  from  the  right  and  this  node  has  a  1  in  its  left 
child  Then  head  down  through  node  z,  always  taking  the  rightmost  node 
containing  a  1  (i.e.,  find  the  maximum  value  in  the  subtree  rooted  at  the  left 
child  z). 

Figure  20.1  shows  the  path  taken  to  find  the  predecessor,  7,  of  the  value  14. 

We  also  augment  the  Insert  and  Delete  operations  appropriately.  When  in¬ 
serting  a  value,  we  store  a  1  in  each  node  on  the  simple  path  from  the  appropriate 
leaf  up  to  the  root.  When  deleting  a  value,  we  go  from  the  appropriate  leaf  up  to 
the  root,  recomputing  the  bit  in  each  internal  node  on  the  path  as  the  logical-or  of 
its  two  children. 

Since  the  height  of  the  tree  is  lg  u  and  each  of  the  above  operations  makes  at 
most  one  pass  up  the  tree  and  at  most  one  pass  down,  each  operation  takes  0(lgu) 
time  in  the  worst  case. 

This  approach  is  only  marginally  better  than  just  using  a  red-black  tree.  We  can 
still  perform  the  Member  operation  in  0(1)  time,  whereas  searching  a  red-black 
tree  takes  0(lgn)  time.  Then  again,  if  the  number  n  of  elements  stored  is  much 
smaller  than  the  size  u  of  the  universe,  a  red-black  tree  would  be  faster  for  all  the 
other  operations. 

Superimposing  a  tree  of  constant  height 

What  happens  if  we  superimpose  a  tree  with  greater  degree?  Let  us  assume  that 
the  size  of  the  universe  is  u  =  22k  for  some  integer  k,  so  that  s/u  is  an  integer. 
Instead  of  superimposing  a  binary  tree  on  top  of  the  bit  vector,  we  superimpose  a 
tree  of  degree  Ju.  Figure  20.2(a)  shows  such  a  tree  for  the  same  bit  vector  as  in 
Figure  20.1.  The  height  of  the  resulting  tree  is  always  2. 

As  before,  each  internal  node  stores  the  logical-or  of  the  bits  within  its  sub¬ 
tree,  so  that  the  *Ju  internal  nodes  at  depth  1  summarize  each  group  of  s/ii  val¬ 
ues.  As  Figure  20.2(b)  demonstrates,  we  can  think  of  these  nodes  as  an  array 
summary  [0 . .  —  1],  where  summary  [i]  contains  a  1  if  and  only  if  the  subar¬ 
ray  A[i  *Ju  . .  (i  +  1  )y/u  —  1]  contains  a  1.  We  call  this  subarray  of  A 

the  zth  cluster.  For  a  given  value  of  x,  the  bit  A[x]  appears  in  cluster  num¬ 
ber  Now  Insert  becomes  an  0(1  )-time  operation:  to  insert  x,  set 

both  A  [a  ]  and  summary  [  |_x  /  *Jii\  ]  to  1.  We  can  use  the  summary  array  to  perform 
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Figure  20.2  (a)  A  tree  of  degree  <Ju  superimposed  on  top  of  the  same  bit  vector  as  in  Figure  20. 1. 

Each  internal  node  stores  the  logical  or  of  the  bits  in  its  subtree,  (b)  A  view  of  the  same  structure, 
but  with  the  internal  nodes  at  depth  1  treated  as  an  array  summary[ 0 . .  s/u  —  1],  where  summary  [/]  is 
the  logical  or  of  the  subarray  A[i  Ju  . .  (/  +  1  )y/u  —  1], 


each  of  the  operations  MINIMUM,  Maximum,  SUCCESSOR,  PREDECESSOR,  and 
DELETE  in  O(v'w)  time: 

•  To  find  the  minimum  (maximum)  value,  find  the  leftmost  (rightmost)  entry  in 
summary  that  contains  a  1,  say  summary[i],  and  then  do  a  linear  search  within 
the  i  th  cluster  for  the  leftmost  (rightmost)  1. 

•  To  find  the  successor  (predecessor)  of  x,  first  search  to  the  right  (left)  within  its 
cluster.  If  we  find  a  1,  that  position  gives  the  result.  Otherwise,  let  i  =  \x / *Ju\ 
and  search  to  the  right  (left)  within  the  summary  array  from  index  i.  The  first 
position  that  holds  a  1  gives  the  index  of  a  cluster.  Search  within  that  cluster 
for  the  leftmost  (rightmost)  1.  That  position  holds  the  successor  (predecessor). 

•  To  delete  the  value  x,  let  i  =  [x / -Ju\\.  Set  A[x]  to  0  and  then  set  summary[i] 
to  the  logical-or  of  the  bits  in  the  i  th  cluster. 

In  each  of  the  above  operations,  we  search  through  at  most  two  clusters  of  sfu  bits 
plus  the  summary  array,  and  so  each  operation  takes  O(^fu)  time. 

At  first  glance,  it  seems  as  though  we  have  made  negative  progress.  Superimpos¬ 
ing  a  binary  tree  gave  us  0(lg  w)-time  operations,  which  are  asymptotically  faster 
than  O(yfu)  time.  Using  a  tree  of  degree  s/u  will  turn  out  to  be  a  key  idea  of  van 
Emde  Boas  trees,  however.  We  continue  down  this  path  in  the  next  section. 

Exercises 


20.1-1 

Modify  the  data  structures  in  this  section  to  support  duplicate  keys. 
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20.1-2 

Modify  the  data  structures  in  this  section  to  support  keys  that  have  associated  satel¬ 
lite  data. 


20.1-3 

Observe  that,  using  the  structures  in  this  section,  the  way  we  find  the  successor  and 
predecessor  of  a  value  x  does  not  depend  on  whether  x  is  in  the  set  at  the  time. 
Show  how  to  find  the  successor  of  x  in  a  binary  search  tree  when  x  is  not  stored  in 
the  tree. 


20.1-4 

Suppose  that  instead  of  superimposing  a  tree  of  degree  ^/u,  we  were  to  superim¬ 
pose  a  tree  of  degree  ul/k ,  where  k  >  I  is  a  constant.  What  would  be  the  height  of 
such  a  tree,  and  how  long  would  each  of  the  operations  take? 


20.2  A  recursive  structure 

In  this  section,  we  modify  the  idea  of  superimposing  a  tree  of  degree  s/u  on  top  of 
a  bit  vector.  In  the  previous  section,  we  used  a  summary  structure  of  size  ■s/u  ,  with 
each  entry  pointing  to  another  stucture  of  size  *J~u.  Now,  we  make  the  structure 
recursive,  shrinking  the  universe  size  by  the  square  root  at  each  level  of  recursion. 
Starting  with  a  universe  of  size  u,  we  make  structures  holding  y/u  —  u1'2  items, 
which  themselves  hold  structures  of  u 1/4  items,  which  hold  structures  of  w 1/8  items, 
and  so  on,  down  to  a  base  size  of  2. 

For  simplicity,  in  this  section,  we  assume  that  u  —  22  for  some  integer  k,  so 
that  u,  n1/2,  m1/4,  . . .  are  integers.  This  restriction  would  be  quite  severe  in  practice, 
allowing  only  values  of  u  in  the  sequence  2,  4, 16, 256,  65536, ....  We  shall  see  in 
the  next  section  how  to  relax  this  assumption  and  assume  only  that  u  =  2k  for 
some  integer  k.  Since  the  structure  we  examine  in  this  section  is  only  a  precursor 
to  the  true  van  Emde  Boas  tree  structure,  we  tolerate  this  restriction  in  favor  of 
aiding  our  understanding. 

Recalling  that  our  goal  is  to  achieve  running  times  of  O(lglgu)  for  the  oper¬ 
ations,  let’s  think  about  how  we  might  obtain  such  running  times.  At  the  end  of 
Section  4.3,  we  saw  that  by  changing  variables,  we  could  show  that  the  recurrence 

T(n)  =  IT  (LVnJ)  +  lgn  (20.1) 

has  the  solution  T(n)  =  0(lg  n  Ig  Ig  n).  Let’s  consider  a  similar,  but  simpler, 
recurrence: 

T(u)  =  T(Ju)  +  0(1)  . 


(20.2) 
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If  we  use  the  same  technique,  changing  variables,  we  can  show  that  recur¬ 
rence  (20.2)  has  the  solution  T(u)  =  O(lglgw).  Let  m  =  lgw,  so  that  w  =  2m 
and  we  have 

T(2m)  =  T(2m/2)  +  0(1)  . 

Now  we  rename  S(m)  =  T  (2™),  giving  the  new  recurrence 
S(m)  =  S(m/2)  +  0(1)  . 

By  case  2  of  the  master  method,  this  recurrence  has  the  solution  S(m)  =  0(lg  m). 
We  change  back  from  S(m)  to  T(u),  giving  T(u)  =  T (2m)  =  S{m )  =  0(lg  m)  = 
0(lg  lgw). 

Recurrence  (20.2)  will  guide  our  search  for  a  data  structure.  We  will  design  a 
recursive  data  structure  that  shrinks  by  a  factor  of  *JTi  in  each  level  of  its  recursion. 
When  an  operation  traverses  this  data  structure,  it  will  spend  a  constant  amount  of 
time  at  each  level  before  recursing  to  the  level  below.  Recurrence  (20.2)  will  then 
characterize  the  running  time  of  the  operation. 

Here  is  another  way  to  think  of  how  the  term  lg  lg  w  ends  up  in  the  solution  to 
recurrence  (20.2).  As  we  look  at  the  universe  size  in  each  level  of  the  recursive  data 
structure,  we  see  the  sequence  w, l/2.  w l,/4,  w l,/8, ....  If  we  consider  how  many  bits 
we  need  to  store  the  universe  size  at  each  level,  we  need  lg  w  at  the  top  level,  and 
each  level  needs  half  the  bits  of  the  previous  level.  In  general,  if  we  start  with  b 
bits  and  halve  the  number  of  bits  at  each  level,  then  after  lg  b  levels,  we  get  down 
to  just  one  bit.  Since  b  =  lg  w,  we  see  that  after  lg  lg  u  levels,  we  have  a  universe 
size  of  2. 

Looking  back  at  the  data  structure  in  Figure  20.2,  a  given  value  x  resides  in 
cluster  number  [x/a/wJ-  If  we  view  x  as  a  lgw -bit  binary  integer,  that  cluster 
number,  |_*/\/wJ,  is  given  by  the  most  significant  (lgw)/2  bits  of  x.  Within  its 
cluster,  x  appears  in  position  x  mod  s/u,  which  is  given  by  the  least  significant 
(lgw)/2  bits  of  x.  We  will  need  to  index  in  this  way,  and  so  let  us  define  some 
functions  that  will  help  us  do  so: 

high(x)  =  [x//U\  , 

low(.v)  =  x  mod  s/u  , 
index(x,y)  =  x^/it  +  y  . 

The  function  high(x)  gives  the  most  significant  (lgw)/2  bits  of  x,  producing  the 
number  of  x’s  cluster.  The  function  low(x)  gives  the  least  significant  (lg  u)/ 2  bits 
of  x  and  provides  x’s  position  within  its  cluster.  The  function  index (x,  y)  builds  an 
element  number  from  x  and  y,  treating  x  as  the  most  significant  (lg  w)/2  bits  of  the 
element  number  and  y  as  the  least  significant  (lgw)/2  bits.  We  have  the  identity 
x  =  index  (high  (x),  low(x)).  The  value  of  u  used  by  each  of  these  functions  will 
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Figure  20.3  The  information  in  a  proto  vEB(u)  structure  when  u  >  4.  The  structure  contains  the 
universe  size  w,  a  pointer  summary  to  a  proto  vEB(sfu)  structure,  and  an  array  cluster[0 . .  *Ju  —  1] 
of  s/u  pointers  to  proto  vEB(-Ju)  structures. 

always  be  the  universe  size  of  the  data  structure  in  which  we  call  the  function, 
which  changes  as  we  descend  into  the  recursive  structure. 

20.2.1  Proto  van  Emde  Boas  structures 

Taking  our  cue  from  recurrence  (20.2),  let  us  design  a  recursive  data  structure  to 
support  the  operations.  Although  this  data  structure  will  fail  to  achieve  our  goal  of 
0(lg  lg  u)  time  for  some  operations,  it  serves  as  a  basis  for  the  van  Emde  Boas  tree 
structure  that  we  will  see  in  Section  20.3. 

For  the  universe  {0, 1,2, . . . ,  u  —  1},  we  define  a  proto  van  Emde  Boas  struc¬ 
ture ,  or  proto-vEB  structure,  which  we  denote  as  proto-vEB(u),  recursively  as 
follows.  Each  proto-vEB(u )  structure  contains  an  attribute  u  giving  its  universe 
size.  In  addition,  it  contains  the  following: 

•  If  u  =  2,  then  it  is  the  base  size,  and  it  contains  an  array  A  [0 . .  1]  of  two  bits. 

•  Otherwise,  u  —  22*  for  some  integer  k  >  1,  so  that  u  >  4.  In  addition 
to  the  universe  size  u,  the  data  structure  proto-vEB(u )  contains  the  following 
attributes,  illustrated  in  Figure  20.3: 

•  a  pointer  named  summary  to  a  proto-vEB (^/u)  structure  and 

•  an  array  cluster [0 . .  ^/u—  1]  of  ^/u  pointers,  each  to  a proto-vEB(  Ju)  struc¬ 
ture. 

The  element  x,  where  0  <  x  <  u,  is  recursively  stored  in  the  cluster  numbered 
high(x)  as  element  low(x)  within  that  cluster. 

In  the  two-level  structure  of  the  previous  section,  each  node  stores  a  summary 
array  of  size  ~Ju,  in  which  each  entry  contains  a  bit.  From  the  index  of  each 
entry,  we  can  compute  the  starting  index  of  the  subarray  of  size  -Ju  that  the  bit 
summarizes.  In  the  proto-vEB  structure,  we  use  explicit  pointers  rather  than  index 
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Figure  20.4  A  proto  vEB(  16)  structure  representing  the  set  {2, 3, 4, 5, 7, 14, 15}.  It  points  to  four 
proto  vEB( 4)  structures  in  cluster[ 0 . .  3],  and  to  a  summary  structure,  which  is  also  a  proto  vEB( 4). 
Each  proto  vEB(4)  structure  points  to  two  proto  vEB( 2)  structures  in  cluster[0 . .  1],  and  to  a 
proto  vEB( 2)  summary.  Each  proto  vEB( 2)  structure  contains  just  an  array  4[0  . .  1]  of  two  bits. 
The  proto  vEB(2)  structures  above  “elements  i,j”  store  bits  /  and  j  of  the  actual  dynamic  set,  and 
the  proto  vEB( 2)  structures  above  “clusters  ij"  store  the  summary  bits  for  clusters  i  and  j  in  the 
top  level  proto  vEB(  16)  structure.  For  clarity,  heavy  shading  indicates  the  top  level  of  a  proto  vEB 
structure  that  stores  summary  information  for  its  parent  structure;  such  a  proto  vEB  structure  is 
otherwise  identical  to  any  other  proto  vEB  structure  with  the  same  universe  size. 
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calculations.  The  array  summary  contains  the  summary  bits  stored  recursively  in  a 
proto-vEB  structure,  and  the  array  cluster  contains  sfii  pointers. 

Figure  20.4  shows  a  fully  expanded  proto-vEB{  16)  structure  representing  the 
set  {2,  3, 4,  5,  7,  14.  15}.  If  the  value  i  is  in  the  proto-vEB  structure  pointed  to  by 
summary,  then  the  z'th  cluster  contains  some  value  in  the  set  being  represented. 
As  in  the  tree  of  constant  height,  cluster[i]  represents  the  values  i  «Ju  through 
(z  +  1)  y/u  —  1,  which  form  the  z'th  cluster. 

At  the  base  level,  the  elements  of  the  actual  dynamic  sets  are  stored  in  some 
of  the  proto-vEB( 2)  structures,  and  the  remaining  proto-vEB{ 2)  structures  store 
summary  bits.  Beneath  each  of  the  non-summary  base  structures,  the  figure  in¬ 
dicates  which  bits  it  stores.  For  example,  the  proto-vEB{2)  structure  labeled 
“elements  6,7”  stores  bit  6  (0,  since  element  6  is  not  in  the  set)  in  its  A[0]  and 
bit  7  (1,  since  element  7  is  in  the  set)  in  its  A  [1]. 

Like  the  clusters,  each  summary  is  just  a  dynamic  set  with  universe  size  s/u  , 
and  so  we  represent  each  summary  as  a  proto-vEB (y/u)  structure.  The  four  sum¬ 
mary  bits  for  the  main  proto-vEB ( 16)  structure  are  in  the  leftmost  proto-vEB (4) 
structure,  and  they  ultimately  appear  in  two  proto-vEB( 2)  structures.  For  exam¬ 
ple,  the  proto-vEB( 2)  structure  labeled  “clusters  2,3”  has  A  [0]  =  0,  indicating  that 
cluster  2  of  the  proto-vEB(  16)  structure  (containing  elements  8,  9,  10,  11)  is  all  0, 
and  A[  1]  =  1,  telling  us  that  cluster  3  (containing  elements  12, 13, 14, 15)  has  at 
least  one  1.  Each proto-vEB( 4)  structure  points  to  its  own  summary,  which  is  itself 
stored  as  a  proto-vEB{2)  structure.  For  example,  look  at  the  proto-vEB(2)  struc¬ 
ture  just  to  the  left  of  the  one  labeled  “elements  0,1.”  Because  its  A[0]  is  0,  it  tells 
us  that  the  “elements  0,1”  structure  is  all  0,  and  because  its  A  [1]  is  1,  we  know  that 
the  “elements  2,3”  structure  contains  at  least  one  1. 

20.2.2  Operations  on  a  proto  van  Emde  Boas  structure 

We  shall  now  describe  how  to  perform  operations  on  a  proto-vEB  structure. 
We  first  examine  the  query  operations— Member,  Minimum,  Maximum,  and 
SUCCESSOR— which  do  not  change  the  proto-vEB  structure.  We  then  discuss 
Insert  and  Delete.  We  leave  Maximum  and  Predecessor,  which  are  sym¬ 
metric  to  Minimum  and  Successor,  respectively,  as  Exercise  20.2-1. 

Each  of  the  Member,  Successor,  Predecessor,  Insert,  and  Delete  op¬ 
erations  takes  a  parameter  x,  along  with  a  proto-vEB  structure  V.  Each  of  these 
operations  assumes  that  0  <  x  <  V.  u. 

Determining  whether  a  value  is  in  the  set 

To  perform  Member (x),  we  need  to  find  the  bit  corresponding  to  x  within  the 
appropriate  proto-vEB{ 2)  structure.  We  can  do  so  in  O(lglgzz)  time,  bypassing 
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the  summary  structures  altogether.  The  following  procedure  takes  a  proto-vEB 
structure  V  and  a  value  x,  and  it  returns  a  bit  indicating  whether  x  is  in  the  dynamic 
set  held  by  V. 

Proto-vEB -Member  (7,  x) 

1  if  V.u  ==  2 

2  return  F.A[x] 

3  else  return  Proto-vEB  -Memb  er  ( V.  cluster  [high  (x)] ,  low  (x)) 

The  Proto-vEB -Member  procedure  works  as  follows.  Line  1  tests  whether 
we  are  in  a  base  case,  where  V  is  a  proto-vEB ( 2)  structure.  Line  2  handles  the 
base  case,  simply  returning  the  appropriate  bit  of  array  A.  Line  3  deals  with  the 
recursive  case,  “drilling  down”  into  the  appropriate  smaller  proto-vEB  structure. 
The  value  high(x)  says  which  proto-vEB («/u)  structure  we  visit,  and  low(x)  de¬ 
termines  which  element  within  that  proto-vEB (y/u)  structure  we  are  querying. 

Let’s  see  what  happens  when  we  call  Proto-vEB -Member  (V,  6)  on  the 
proto-vEB(  16)  structure  in  Figure  20.4.  Since  high(6)  =  1  when  u  =  16,  we 
recurse  into  the  proto-vEB { 4)  structure  in  the  upper  right,  and  we  ask  about  ele¬ 
ment  low (6)  =  2  of  that  structure.  In  this  recursive  call,  u  =  4,  and  so  we  recurse 
again.  With  u  =  4,  we  have  high(2)  =  1  and  low(2)  =  0,  and  so  we  ask  about 
element  0  of  the  proto-vEB ( 2)  structure  in  the  upper  right.  This  recursive  call  turns 
out  to  be  a  base  case,  and  so  it  returns  A[0]  =  0  back  up  through  the  chain  of  re¬ 
cursive  calls.  Thus,  we  get  the  result  that  Proto-vEB-Member(F,  6)  returns  0, 
indicating  that  6  is  not  in  the  set. 

To  determine  the  running  time  of  Proto-vEB-Member,  let  T(u)  denote 
its  running  time  on  a  proto-vEB (u)  structure.  Each  recursive  call  takes  con¬ 
stant  time,  not  including  the  time  taken  by  the  recursive  calls  that  it  makes. 
When  Proto-vEB-Member  makes  a  recursive  call,  it  makes  a  call  on  a 
proto-vEB(y/u)  structure.  Thus,  we  can  characterize  the  running  time  by  the  recur¬ 
rence  T(u)  =  T(^/u)  +  0(1),  which  we  have  already  seen  as  recurrence  (20.2). 
Its  solution  is  T ( u )  =  0(lg  lg  u),  and  so  we  conclude  that  Proto-vEB-Member 
runs  in  time  0(lg  lg  u). 

Finding  the  minimum  element 

Now  we  examine  how  to  perform  the  Minimum  operation.  The  procedure 
Proto-vEB-Minimum(F)  returns  the  minimum  element  in  the  proto-vEB  struc¬ 
ture  V,  or  NIL  if  V  represents  an  empty  set. 
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Proto- vEB  -Minimum  ( V) 

1  if  V.u  ==  2 

2  if  V.A[ 0]  ==  1 

3  return  0 

4  elseif  V.A[  1]  ==  1 

5  return  1 

6  else  return  NIL 

7  else  min-cluster  =  PROTO  -  V  EB-Mimm  i;  M  ( L.  summary ) 

8  if  min-cluster  ==  NIL 

9  return  NIL 

10  else  offset  =  PROTO-vEB-MlNlMUM(F.cfi«fer|m/n-c/M5'ter]) 

1 1  return  i  ndex  {min-cluster ,  offset) 

This  procedure  works  as  follows.  Line  1  tests  for  the  base  case,  which  lines  2-6 
handle  by  brute  force.  Lines  7-1 1  handle  the  recursive  case.  First,  line  7  finds  the 
number  of  the  first  cluster  that  contains  an  element  of  the  set.  It  does  so  by  recur¬ 
sively  calling  PROTO- vEB -Minimum  on  V. summary,  which  is  a proto-vEB(ffu) 
structure.  Line  7  assigns  this  cluster  number  to  the  variable  min-cluster .  If  the  set 
is  empty,  then  the  recursive  call  returned  NIL,  and  line  9  returns  NIL.  Otherwise, 
the  minimum  element  of  the  set  is  somewhere  in  cluster  number  min-cluster .  The 
recursive  call  in  line  10  finds  the  offset  within  the  cluster  of  the  minimum  element 
in  this  cluster.  Finally,  line  1 1  constructs  the  value  of  the  minimum  element  from 
the  cluster  number  and  offset,  and  it  returns  this  value. 

Although  querying  the  summary  information  allows  us  to  quickly  find  the  clus¬ 
ter  containing  the  minimum  element,  because  this  procedure  makes  two  recursive 
calls  on  proto-vEB(ffu)  structures,  it  does  not  run  in  0( lg  lg  u)  time  in  the  worst 
case.  Letting  T(u)  denote  the  worst-case  time  for  Proto-vEB-Minimum  on  a 
proto-vEB(u)  structure,  we  have  the  recurrence 

T(u)  =  2 T(ffu)  +  0(1)  .  (20.3) 

Again,  we  use  a  change  of  variables  to  solve  this  recurrence,  letting  m  =  lg  n, 
which  gives 

T(2m)  =  2T(2m/1)  +  0(1)  . 

Renaming  S(m)  =  T(2m)  gives 
S(m)  =  2S(m/2)  +  0(1)  , 

which,  by  case  1  of  the  master  method,  has  the  solution  S(m)  =  @(m).  By  chang¬ 
ing  back  from  S(m)  to  T(u),  we  have  that  T(u)  =  T(2m)  =  S(m)  —  &(m)  = 
0(lg  it).  Thus,  we  see  that  because  of  the  second  recursive  call,  Proto-vEB- 
Minimum  runs  in  0(lg  u)  time  rather  than  the  desired  0(lg  lg  u)  time. 
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Finding  the  successor 

The  SUCCESSOR  operation  is  even  worse.  In  the  worst  case,  it  makes  two  recursive 
calls,  along  with  a  call  to  Proto-vEB-Minimum.  The  procedure  Proto-vEB- 
SUCCESS0R(F,  x)  returns  the  smallest  element  in  the  proto-vEB  structure  V  that 
is  greater  than  x,  or  NIL  if  no  element  in  V  is  greater  than  x.  It  does  not  require  x 
to  be  a  member  of  the  set,  but  it  does  assume  that  0  <  x  <  V.  u. 

Proto-vEB-Successor(F,  x) 

1  if  V.u  ==  2 

2  if  x  ==  0  and  V.A[l]  ==  1 

3  return  1 

4  else  return  NIL 

5  els e  offset  =  PROTO-vEB-SucCESSOR(K.c/z«fer[high(x)],  low(x)) 

6  if  offset  ^  NIL 

7  return  index(high(x),  offset) 

8  els e  succ-cluster  —  PROTO-vEB-SucCESSOR(K..»<»7n?my,  high(x)) 

9  if  succ-cluster  ==  NIL 

10  return  NIL 

11  else  offset  =  PROTO  -  V  E  B  -  M  l  N I M  t;  M  ( V  cluster  [succ-cluster] ) 

12  return  ind ex(succ-cluster,  offset) 

The  Proto-vEB -Successor  procedure  works  as  follows.  As  usual,  line  1 
tests  for  the  base  case,  which  lines  2-4  handle  by  brute  force:  the  only  way  that  x 
can  have  a  successor  within  a  proto-vEB( 2)  structure  is  when  x  =  0  and  A[  1] 
is  1.  Lines  5-12  handle  the  recursive  case.  Line  5  searches  for  a  successor  to  x 
within  x’s  cluster,  assigning  the  result  to  offset.  Line  6  determines  whether  x  has 
a  successor  within  its  cluster;  if  it  does,  then  line  7  computes  and  returns  the  value 
of  this  successor.  Otherwise,  we  have  to  search  in  other  clusters.  Line  8  assigns  to 
succ-cluster  the  number  of  the  next  nonempty  cluster,  using  the  summary  informa¬ 
tion  to  find  it.  Line  9  tests  whether  succ-cluster  is  NIL,  with  line  10  returning  NIL 
if  all  succeeding  clusters  are  empty.  If  succ-cluster  is  non-NlL,  line  1 1  assigns 
the  first  element  within  that  cluster  to  offset,  and  line  12  computes  and  returns  the 
minimum  element  in  that  cluster. 

In  the  worst  case,  PROTO-VEB-SUCCESSOR  calls  itself  recursively  twice  on 
proto-vEB (y/u)  structures,  and  it  makes  one  call  to  Proto-vEB-Minimum  on 
a  proto-vEB)  ffu)  structure.  Thus,  the  recurrence  for  the  worst-case  running 
time  T(u)  of  PROTO-VEB-SUCCESSOR  is 

T(u)  =  2T{  ffu)  +  0(lg  Vu) 

=  2T(ffu)  +  0(lg  u)  . 
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We  can  employ  the  same  technique  that  we  used  for  recurrence  (20.1)  to  show 
that  this  recurrence  has  the  solution  T ( u )  =  @(lg  u  lg  lg  u).  Thus,  Proto-vEB- 
SUCCESSOR  is  asymptotically  slower  than  Proto-vEB-Minimum. 

Inserting  an  element 

To  insert  an  element,  we  need  to  insert  it  into  the  appropriate  cluster  and  also  set 
the  summary  bit  for  that  cluster  to  1.  The  procedure  PROTO- vEB -INSERT (E,  x) 
inserts  the  value  x  into  the  proto- vEB  structure  V. 

Proto- vEB  -Insert  ( V,  x) 

1  if  V.u  ==  2 

2  V.A[x]  =  1 

3  else  PROTO-vEB-lNSERT(Ec/ns'ter[high(x)],  low(x)) 

4  PROTO- VEB -Insert (V.  summary,  high(x)) 

In  the  base  case,  line  2  sets  the  appropriate  bit  in  the  array  A  to  1.  In  the  recursive 
case,  the  recursive  call  in  line  3  inserts  x  into  the  appropriate  cluster,  and  line  4 
sets  the  summary  bit  for  that  cluster  to  1 . 

Because  PROTO- vEB -INSERT  makes  two  recursive  calls  in  the  worst  case,  re¬ 
currence  (20.3)  characterizes  its  running  time.  Hence,  PROTO- vEB -INSERT  runs 
in  0(lg  u)  time. 

Deleting  an  element 

The  Delete  operation  is  more  complicated  than  insertion.  Whereas  we  can  always 
set  a  summary  bit  to  1  when  inserting,  we  cannot  always  reset  the  same  summary 
bit  to  0  when  deleting.  We  need  to  determine  whether  any  bit  in  the  appropriate 
cluster  is  1.  As  we  have  defined  proto-vEB  structures,  we  would  have  to  examine 
all  *Ju  bits  within  a  cluster  to  determine  whether  any  of  them  are  1 .  Alternatively, 
we  could  add  an  attribute  n  to  the  proto-vEB  structure,  counting  how  many  el¬ 
ements  it  has.  We  leave  implementation  of  Proto-vEB -Delete  as  Exercises 
20.2-2  and  20.2-3. 


Clearly,  we  need  to  modify  the  proto-vEB  structure  to  get  each  operation  down 
to  making  at  most  one  recursive  call.  We  will  see  in  the  next  section  how  to  do  so. 

Exercises 


20.2-1 

Write  pseudocode  for  the  procedures  Proto-vEB -Maximum  and  Proto-vEB- 
Predecessor. 
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20.2-2 

Write  pseudocode  for  Proto- vEB -Delete.  It  should  update  the  appropriate 
summary  bit  by  scanning  the  related  bits  within  the  cluster.  What  is  the  worst- 
case  running  time  of  your  procedure? 


20.2-3 

Add  the  attribute  n  to  each  proto-vEB  structure,  giving  the  number  of  elements 
currently  in  the  set  it  represents,  and  write  pseudocode  for  Proto-vEB -Delete 
that  uses  the  attribute  n  to  decide  when  to  reset  summary  bits  to  0.  What  is  the 
worst-case  running  time  of  your  procedure?  What  other  procedures  need  to  change 
because  of  the  new  attribute?  Do  these  changes  affect  their  running  times? 


20.2-4 

Modify  the  proto-vEB  structure  to  support  duplicate  keys. 


20.2-5 

Modify  the  proto-vEB  structure  to  support  keys  that  have  associated  satellite  data. 


20.2-6 

Write  pseudocode  for  a  procedure  that  creates  a  proto-vEB(u )  structure. 


20.2-7 

Argue  that  if  line  9  of  Proto-vEB -Minimum  is  executed,  then  the  proto-vEB 
structure  is  empty. 


20.2-8 

Suppose  that  we  designed  a  proto-vEB  structure  in  which  each  cluster  array  had 
only  w 1/4  elements.  What  would  the  running  times  of  each  operation  be? 


20.3  The  van  Emde  Boas  tree 

The  proto-vEB  structure  of  the  previous  section  is  close  to  what  we  need  to  achieve 
O(lglgn)  running  times.  It  falls  short  because  we  have  to  recurse  too  many  times 
in  most  of  the  operations.  In  this  section,  we  shall  design  a  data  structure  that 
is  similar  to  the  proto-vEB  structure  but  stores  a  little  more  information,  thereby 
removing  the  need  for  some  of  the  recursion. 

In  Section  20.2,  we  observed  that  the  assumption  that  we  made  about  the  uni¬ 
verse  size— that  u  =  22':  for  some  integer  k— is  unduly  restrictive,  confining  the 
possible  values  of  u  an  overly  sparse  set.  From  this  point  on,  therefore,  we  will 
allow  the  universe  size  u  to  be  any  exact  power  of  2,  and  when  y/u  is  not  an  inte- 
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vEB(  X/u) 


X/u  vEB(  X/u)  trees 


Figure  20.5  The  information  in  a  vEB(u)  tree  when  u  >  2.  The  structure  contains  the  uni 
verse  size  u,  elements  min  and  max ,  a  pointer  summary  to  a  vEB(  X/u)  tree,  and  an  array 
cluster[ 0 . .  X/u  —  1]  of  X/u  pointers  to  vEB(  X/u)  trees. 

ger— that  is,  if  u  is  an  odd  power  of  2  (u  =  22k+l  for  some  integer  k  >  0)— then 
we  will  divide  the  lg  u  bits  of  a  number  into  the  most  significant  f  (lg  u)/2]  bits  and 
the  least  significant  [Og  w)/2J  bits.  For  convenience,  we  denote  2r(lgu)^21  (the  “up¬ 
per  square  root”  of  u )  by  X/u  and  2^(lg“^2J  (the  “lower  square  root”  of  u)  by  X/u, 
so  that  u  =  X/u  •  X/u  and,  when  u  is  an  even  power  of  2  (u  =  22k  for  some 
integer  k),  X/u  =  X/u  =  */u.  Because  we  now  allow  u  to  be  an  odd  power  of  2, 
we  must  redefine  our  helpful  functions  from  Section  20.2: 

high(x)  =  \x/  X/u \  , 
low(x)  =  x  mod  X/u  , 
index(x,y)  =  xX/u  +  y. 

20.3.1  van  Emde  Boas  trees 

The  van  Emde  Boas  tree,  or  vEB  tree,  modifies  the  proto-vEB  structure.  We 
denote  a  vEB  tree  with  a  universe  size  of  u  as  vEB(u )  and,  unless  u  equals  the 
base  size  of  2,  the  attribute  summary  points  to  a  vEB( X/u)  tree  and  the  array 
cluster[ 0. .  X/u  —  1]  points  to  X/u  vEB(X/u)  trees.  As  Figure  20.5  illustrates,  a 
vEB  tree  contains  two  attributes  not  found  in  a  proto-vEB  structure: 

•  min  stores  the  minimum  element  in  the  vEB  tree,  and 

•  max  stores  the  maximum  element  in  the  vEB  tree. 

Furthermore,  the  element  stored  in  min  does  not  appear  in  any  of  the  recur¬ 
sive  vEB(X/u)  trees  that  the  cluster  array  points  to.  The  elements  stored  in  a 
vEB(u)  tree  V,  therefore,  are  V.min  plus  all  the  elements  recursively  stored  in 
the  vEB(X/u)  trees  pointed  to  by  V. clustery 0. .  X/u  —  1].  Note  that  when  a  vEB 
tree  contains  two  or  more  elements,  we  treat  min  and  max  differently:  the  element 
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stored  in  min  does  not  appeal-  in  any  of  the  clusters,  but  the  element  stored  in  max 
does. 

Since  the  base  size  is  2,  a  vEB( 2)  tree  does  not  need  the  array  A  that  the  cor¬ 
responding  proto-vEB( 2)  structure  has.  Instead,  we  can  determine  its  elements 
from  its  min  and  max  attributes.  In  a  vEB  tree  with  no  elements,  regardless  of  its 
universe  size  u,  both  min  and  max  are  NIL. 

Figure  20.6  shows  a  vEB(  16)  tree  V  holding  the  set  {2,  3,  4,  5, 7,  14,  15}.  Be¬ 
cause  the  smallest  element  is  2,  V.min  equals  2,  and  even  though  high(2)  =  0,  the 
element  2  does  not  appeal-  in  the  vEB(A)  tree  pointed  to  by  V.cluster[0\:  notice 
that  V. cluster [0] . min  equals  3,  and  so  2  is  not  in  this  vEB  tree.  Similarly,  since 
V.  cluster  [0] .  min  equals  3,  and  2  and  3  are  the  only  elements  in  V.  cluster  [ 0],  the 
vEB(2)  clusters  within  V.cluster[ 0]  are  empty. 

The  min  and  max  attributes  will  turn  out  to  be  key  to  reducing  the  number  of 
recursive  calls  within  the  operations  on  vEB  trees.  These  attributes  will  help  us  in 
four  ways: 

1 .  The  Minimum  and  Maximum  operations  do  not  even  need  to  recurse,  for  they 
can  just  return  the  values  of  min  or  max. 

2.  The  SUCCESSOR  operation  can  avoid  making  a  recursive  call  to  determine 
whether  the  successor  of  a  value  x  lies  within  high(x).  That  is  because  x’s 
successor  lies  within  its  cluster  if  and  only  if  x  is  strictly  less  than  the  max 
attribute  of  its  cluster.  A  symmetric  argument  holds  for  PREDECESSOR  and 
min. 

3.  We  can  tell  whether  a  vEB  tree  has  no  elements,  exactly  one  element,  or  at  least 
two  elements  in  constant  time  from  its  min  and  max  values.  This  ability  will 
help  in  the  Insert  and  Delete  operations.  If  min  and  max  are  both  nil,  then 
the  vEB  tree  has  no  elements.  If  min  and  max  are  non-NlL  but  are  equal  to  each 
other,  then  the  vEB  tree  has  exactly  one  element.  Otherwise,  both  min  and  max 
are  non-NlL  but  are  unequal,  and  the  vEB  tree  has  two  or  more  elements. 

4.  If  we  know  that  a  vEB  tree  is  empty,  we  can  insert  an  element  into  it  by  updating 
only  its  min  and  max  attributes.  Hence,  we  can  insert  into  an  empty  vEB  tree  in 
constant  time.  Similarly,  if  we  know  that  a  vEB  tree  has  only  one  element,  we 
can  delete  that  element  in  constant  time  by  updating  only  min  and  max.  These 
properties  will  allow  us  to  cut  short  the  chain  of  recursive  calls. 

Even  if  the  universe  size  u  is  an  odd  power  of  2,  the  difference  in  the  sizes 
of  the  summary  vEB  tree  and  the  clusters  will  not  turn  out  to  affect  the  asymptotic 
running  times  of  the  vEB-tree  operations.  The  recursive  procedures  that  implement 
the  vEB-tree  operations  will  all  have  running  times  characterized  by  the  recurrence 

T(u)  <  T(t/u)  +  D(l)  . 


(20.4) 
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Figure  20.6  A  v£B(16)  tree  corresponding  to  the  proto  vEB  tree  in  Figure  20.4.  It  stores  the  set 
{2, 3,4, 5,7, 14, 15}.  Slashes  indicate  NIL  values.  The  value  stored  in  the  min  attribute  of  a  vEB  tree 
does  not  appear  in  any  of  its  clusters.  Heavy  shading  serves  the  same  purpose  here  as  in  Figure  20.4. 
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This  recurrence  looks  similar  to  recurrence  (20.2),  and  we  will  solve  it  in  a  similar 
fashion.  Letting  m  =  lg  u,  we  rewrite  it  as 

T(2m)  <  T{ 2[m/21)  +  0(1)  . 

Noting  that  \m/2~\  <  2/m  / 3  for  all  m  >  2,  we  have 

T( 2m)  <  T{ 22m/3)  +  0(1)  . 

Letting  S(m)  =  T(2m),  we  rewrite  this  last  recurrence  as 
S(m )  <  S(2m/2>)  +  0(1)  , 

which,  by  case  2  of  the  master  method,  has  the  solution  S(m)  =  O (lg m).  (In 
terms  of  the  asymptotic  solution,  the  fraction  2/3  does  not  make  any  difference 
compared  with  the  fraction  1/2,  because  when  we  apply  the  master  method,  we 
find  that  log3/,2  1  =  log2  1  =  0.)  Thus,  we  have  T(u)  =  T(2m)  =  S(m)  = 
0(lg  m)  =  O(lglgn). 

Before  using  a  van  Emde  Boas  tree,  we  must  know  the  universe  size  u,  so  that 
we  can  create  a  van  Emde  Boas  tree  of  the  appropriate  size  that  initially  represents 
an  empty  set.  As  Problem  20-1  asks  you  to  show,  the  total  space  requirement  of 
a  van  Emde  Boas  tree  is  0(u),  and  it  is  straightforward  to  create  an  empty  tree 
in  0{u)  time.  In  contrast,  we  can  create  an  empty  red-black  tree  in  constant  time. 
Therefore,  we  might  not  want  to  use  a  van  Emde  Boas  tree  when  we  perform  only 
a  small  number  of  operations,  since  the  time  to  create  the  data  structure  would 
exceed  the  time  saved  in  the  individual  operations.  This  drawback  is  usually  not 
significant,  since  we  typically  use  a  simple  data  structure,  such  as  an  array  or  linked 
list,  to  represent  a  set  with  only  a  few  elements. 

20.3.2  Operations  on  a  van  Emde  Boas  tree 

We  are  now  ready  to  see  how  to  perform  operations  on  a  van  Emde  Boas  tree.  As 
we  did  for  the  proto  van  Emde  Boas  structure,  we  will  consider  the  querying  oper¬ 
ations  first,  and  then  Insert  and  Delete.  Due  to  the  slight  asymmetry  between 
the  minimum  and  maximum  elements  in  a  vEB  tree— when  a  vEB  tree  contains 
at  least  two  elements,  the  minumum  element  does  not  appeal-  within  a  cluster  but 
the  maximum  element  does— we  will  provide  pseudocode  for  all  five  querying  op¬ 
erations.  As  in  the  operations  on  proto  van  Emde  Boas  structures,  the  operations 
here  that  take  parameters  V  and  x,  where  V  is  a  van  Emde  Boas  tree  and  x  is  an 
element,  assume  that  0  <  x  <  V.u. 

Finding  the  minimum  and  maximum  elements 

Because  we  store  the  minimum  and  maximum  in  the  attributes  min  and  max,  two 
of  the  operations  are  one-liners,  taking  constant  time: 
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vEB -Tree-Minimum  (V) 

1  return  V.  min 

vEB-Tree-Maximum(F) 

1  return  V.  max 

Determining  whether  a  value  is  in  the  set 

The  procedure  vEB-Tree-Member(F,  x)  has  a  recursive  case  like  that  of 
Proto- vEB -Member,  but  the  base  case  is  a  little  different.  We  also  check  di¬ 
rectly  whether  x  equals  the  minimum  or  maximum  element.  Since  a  vEB  tree 
doesn’t  store  bits  as  aproto-vEB  structure  does,  we  design  vEB-Tree-Member 
to  return  TRUE  or  FALSE  rather  than  1  or  0. 

vEB-Tree-Member(E,  x) 

1  if  x  ==  V.min  or  x  ==  V.max 

2  return  TRUE 

3  elseif  V. n  ==  2 

4  return  FALSE 

5  else  return  vEB-TREE-MEMBER(E.c/n.9ter[high(x)],  low(x)) 

Line  1  checks  to  see  whether  x  equals  either  the  minimum  or  maximum  element. 
If  it  does,  line  2  returns  TRUE.  Otherwise,  line  3  tests  for  the  base  case.  Since 
a  vEB( 2)  tree  has  no  elements  other  than  those  in  min  and  max,  if  it  is  the  base 
case,  line  4  returns  FALSE.  The  other  possibility —it  is  not  a  base  case  and  x  equals 
neither  min  nor  max— is  handled  by  the  recursive  call  in  line  5. 

Recurrence  (20.4)  characterizes  the  running  time  of  the  vEB-Tree-Member 
procedure,  and  so  this  procedure  takes  0(lg  Ig  u)  time. 

Finding  the  successor  and  predecessor 

Next  we  see  how  to  implement  the  SUCCESSOR  operation.  Recall  that  the  pro¬ 
cedure  Proto-vEB-Successor(E,  x)  could  make  two  recursive  calls:  one  to 
determine  whether  x’s  successor  resides  in  the  same  cluster  as  x  and,  if  it  does 
not,  one  to  find  the  cluster  containing  x’s  successor.  Because  we  can  access  the 
maximum  value  in  a  vEB  tree  quickly,  we  can  avoid  making  two  recursive  calls, 
and  instead  make  one  recursive  call  on  either  a  cluster  or  on  the  summary,  but  not 
on  both. 
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vEB -Tree-Successor  (K,  x) 

1  if  V.u  ==  2 

2  if  x  ==  0  and  V.max  ==  1 

3  return  1 

4  else  return  NIL 

5  elseif  V.  min  ^  NIL  and  x  <  V.  min 

6  return  V.  min 

7  else  max-low  =  V  E  B  -T  R  EL  -  M  A  X  l  M  i;  M  ( V.  cluster  [h  igh  (x )] ) 

8  if  max-low  ^  NIL  and  low(x)  <  max-low 

9  offset  =  vEB-TREE-SucCESSOR(E.c/iwter[high(x)],  low(x)) 

1 0  return  index  (high  (x ) ,  offset) 

11  else  succ-cluster  =  vEB-TREE-SucCESSOR(F.5W7?mary,  high(x)) 

12  if  succ-cluster  ==  NIL 

1 3  return  NIL 

14  else  offset  =  vEB-TREE-MlNlMUM(F.c/7«fer[.racc-c/w.s'ter]) 

1 5  return  index  ( succ-cluster ,  offset) 

This  procedure  has  six  return  statements  and  several  cases.  We  start  with  the 
base  case  in  lines  2-4,  which  returns  1  in  line  3  if  we  are  trying  to  find  the  successor 
of  0  and  1  is  in  the  2-element  set;  otherwise,  the  base  case  returns  NIL  in  line  4. 

If  we  are  not  in  the  base  case,  we  next  check  in  line  5  whether  x  is  strictly  less 
than  the  minimum  element.  If  so,  then  we  simply  return  the  minimum  element  in 
line  6. 

If  we  get  to  line  7,  then  we  know  that  we  are  not  in  a  base  case  and  that  x  is 
greater  than  or  equal  to  the  minimum  value  in  the  vEB  tree  V.  Line  7  assigns  to 
max-low  the  maximum  element  in  x’s  cluster.  If  x’s  cluster  contains  some  element 
that  is  greater  than  x,  then  we  know  that  x’s  successor  lies  somewhere  within  x’s 
cluster.  Line  8  tests  for  this  condition.  If  x ’s  successor  is  within  x ’s  cluster,  then 
line  9  determines  where  in  the  cluster  it  is,  and  line  10  returns  the  successor  in  the 
same  way  as  line  7  of  PROTO- vEB -SUCCESSOR. 

We  get  to  line  11  if  x  is  greater  than  or  equal  to  the  greatest  element  in  its 
cluster.  In  this  case,  lines  11-15  find  x’s  successor  in  the  same  way  as  lines  8-12 
of  Proto-vEB-Successor. 

It  is  easy  to  see  how  recurrence  (20.4)  characterizes  the  running  time  of  vEB- 
TREE-SUCCESSOR.  Depending  on  the  result  of  the  test  in  line  7,  the  procedure 
calls  itself  recursively  in  either  line  9  (on  a  vEB  tree  with  universe  size  ffu)  or 
line  11  (on  a  vEB  tree  with  universe  size  ffu).  In  either  case,  the  one  recursive 
call  is  on  a  vEB  tree  with  universe  size  at  most  l/u.  The  remainder  of  the  proce¬ 
dure,  including  the  calls  to  vEB-Tree-Minimum  and  vEB-Tree-Maximum, 
takes  0(1)  time.  Hence,  vEB-Tree-Successor  runs  in  O(lglgn)  worst-case 
time. 
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The  vEB -Tree-Predecessor  procedure  is  symmetric  to  the  vEB-Tree- 
SUCCESSOR  procedure,  but  with  one  additional  case: 

vEB -Tree-Predecessor  (V,  x) 

1  if  V.u  ==  2 

2  if  x  ==  1  and  V.min  ==  0 

3  return  0 

4  else  return  NIL 

5  elseif  V.  max  ^  NIL  and  x  >  V.  max 

6  return  V.  max 

7  else  min- low  =  V E B -T R EL- M I Nl M i; M ( V. cluster [h igh (x)]) 

8  if  min-low  ^  NIL  and  low(x)  >  min-low 

9  offset  =  vEB -Tree-Predecessor  (V.  cluster  [high(x)] ,  low(x)) 

10  return  index(high(x),  offset) 

11  else  pred-cluster  —  V  E  B  - T  R  L L  -  P  R  L D  EC  LS  S  O  R  ( V.  summary ,  high  (x)) 

12  if  pred-cluster  ==  NIL 

13  if  V.min  ^  NIL  and  x  >  V.min 

14  return  V.min 

1 5  else  return  NIL 

16  else  offset  =  vEB -Tree-Maximum  (V. cluster \pred-cluster]) 

17  return  ind ex(pred-cluster ,  offset) 

Lines  13-14  form  the  additional  case.  This  case  occurs  when  x’s  predecessor, 
if  it  exists,  does  not  reside  in  x’s  cluster.  In  vEB-Tree-Successor,  we  were 
assured  that  if  x’s  successor  resides  outside  of  x’s  cluster,  then  it  must  reside  in 
a  higher-numbered  cluster.  But  if  x’s  predecessor  is  the  minimum  value  in  vEB 
tree  V,  then  the  successor  resides  in  no  cluster  at  all.  Line  13  checks  for  this 
condition,  and  line  14  returns  the  minimum  value  as  appropriate. 

This  extra  case  does  not  affect  the  asymptotic  running  time  of  vEB-Tree- 
Predecessor  when  compared  with  vEB-Tree-Successor,  and  so  vEB- 
Tree-Predecessor  runs  in  O(lglgw)  worst-case  time. 

Inserting  an  element 

Now  we  examine  how  to  insert  an  element  into  a  vEB  tree.  Recall  that  PROTO- 
vEB -Insert  made  two  recursive  calls:  one  to  insert  the  element  and  one  to  insert 
the  element’s  cluster  number  into  the  summary.  The  vEB -Tree-Insert  proce¬ 
dure  will  make  only  one  recursive  call.  How  can  we  get  away  with  just  one?  When 
we  insert  an  element,  either  the  cluster  that  it  goes  into  already  has  another  element 
or  it  does  not.  If  the  cluster  already  has  another  element,  then  the  cluster  number 
is  already  in  the  summary,  and  so  we  do  not  need  to  make  that  recursive  call.  If 
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the  cluster  does  not  already  have  another  element,  then  the  element  being  inserted 
becomes  the  only  element  in  the  cluster,  and  we  do  not  need  to  recurse  to  insert  an 
element  into  an  empty  vEB  tree: 

vEB  -Empty-Tree-Insert  ( V,  x ) 

1  V.min  =  x 

2  V.max  =  x 

With  this  procedure  in  hand,  here  is  the  pseudocode  for  vEB-Tree-Insert  (V.  x), 
which  assumes  that  x  is  not  already  an  element  in  the  set  represented  by  vEB 
tree  V: 

vEB  -Tree-Insert  ( F,  x) 

1  if  V.min  ==  NIL 

2  vEB-Empty-Tree-Insert(F,  x) 

3  else  if  x  <  V.  min 

4  exchange  x  with  V.min 

5  if  V.  u  >  2 

6  if  vEB-TREE-MlNlMUM(F.cfr«fer[high(x)])  ==  NIL 

7  V  E  B  -  T  R  EE  - 1 N  S  E  RT  ( V.  summary .  high(x)) 

8  vEB-EMPTY-TREE-lNSERT(F.c/«ster[high(x)],  low(x)) 

9  else  vEB-TREE-lNSERT(F.c/«ster[high(x)],  low(x)) 

10  if  x  >  V.max 

11  V.max  =  x 

This  procedure  works  as  follows.  Line  1  tests  whether  V  is  an  empty  vEB  tree 
and,  if  it  is,  then  line  2  handles  this  easy  case.  Lines  3-11  assume  that  V  is  not 
empty,  and  therefore  some  element  will  be  inserted  into  one  of  F’s  clusters.  But 
that  element  might  not  necessarily  be  the  element  x  passed  to  vEB-Tree-Insert. 
If  x  <  min,  as  tested  in  line  3,  then  x  needs  to  become  the  new  min.  We  don’t 
want  to  lose  the  original  min,  however,  and  so  we  need  to  insert  it  into  one  of  F’s 
clusters.  In  this  case,  line  4  exchanges  x  with  min,  so  that  we  insert  the  original 
min  into  one  of  F’s  clusters. 

We  execute  lines  6-9  only  if  F  is  not  a  base-case  vEB  tree.  Line  6  determines 
whether  the  cluster  that  x  will  go  into  is  currently  empty.  If  so,  then  line  7  in¬ 
serts  x’s  cluster  number  into  the  summary  and  line  8  handles  the  easy  case  of 
inserting  x  into  an  empty  cluster.  If  x’s  cluster  is  not  currently  empty,  then  line  9 
inserts  x  into  its  cluster.  In  this  case,  we  do  not  need  to  update  the  summary, 
since  x ’s  cluster  number  is  already  a  member  of  the  summary. 

Finally,  lines  10-11  take  care  of  updating  max  if  x  >  max.  Note  that  if  F  is  a 
base-case  vEB  tree  that  is  not  empty,  then  lines  3-4  and  10-1 1  update  min  and  max 
properly. 
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Once  again,  we  can  easily  see  how  recurrence  (20.4)  characterizes  the  running 
time.  Depending  on  the  result  of  the  test  in  line  6,  either  the  recursive  call  in  line  7 
(run  on  a  vEB  tree  with  universe  size  2Ju)  or  the  recursive  call  in  line  9  (run  on 
a  vEB  with  universe  size  i/u)  executes.  In  either  case,  the  one  recursive  call  is 
on  a  vEB  tree  with  universe  size  at  most  1/u.  Because  the  remainder  of  vEB- 
Tree-Insert  takes  (9(1)  time,  recurrence  (20.4)  applies,  and  so  the  running  time 
is  O(lglgw). 

Deleting  an  element 

Finally,  we  look  at  how  to  delete  an  element  from  a  vEB  tree.  The  procedure 
vEB -Tree-Delete  (F,  x)  assumes  that  x  is  currently  an  element  in  the  set  repre¬ 
sented  by  the  vEB  tree  F. 

vEB -Tree-Delete  (F,x) 

1  if  V.min  ==  V.max 

2  V.min  =  NIL 

3  V.max  =  NIL 

4  elseif  V.  u  ==  2 

5  if  x  ==  0 

6  V.min  =  1 

7  else  V.min  =  0 

8  V.max  =  V.min 

9  else  if  x  ==  V.min 

10  first-cluster  =  vEB -Tree-Minimum  (V.  summary) 

11  x  =  index  (first-cluster, 

vEB -Tree-Minimum  (V.cluster[first-cluster])) 

12  V.min  =  x 

13  vEB-TREE-DELETE(Fc/uster[high(x)],  low(x)) 

14  if  vEB -Tree-Minimum (V. cluster [high(x)])  ==  nil 

15  vEB -Tree-Delete  (V.  summary,  high(x)) 

16  if  x  ==  V.max 

17  summary-max  =  VEB -TREE-MAXIMUM  (V. summary) 

18  if  summary-max  ==  NIL 

19  V.max  =  V.min 

20  else  V.max  =  indexes  um  ma  ry  -  max , 

v  EB  -Tree-Maximum  ( V.  cluster[summary-max\ )) 

21  elseif  x  ==  V.max 

22  V.max  =  index(high(x), 

vEB-Tree-Maximum  (F.  duster  [high(x)])) 
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The  vEB -Tree-Delete  procedure  works  as  follows.  If  the  vEB  tree  V  con¬ 
tains  only  one  element,  then  it’s  just  as  easy  to  delete  it  as  it  was  to  insert  an  element 
into  an  empty  vEB  tree:  just  set  min  and  max  to  NIL.  Lines  1-3  handle  this  case. 
Otherwise,  V  has  at  least  two  elements.  Line  4  tests  whether  V  is  a  base-case  vEB 
tree  and,  if  so,  lines  5-8  set  min  and  max  to  the  one  remaining  element. 

Lines  9-22  assume  that  V  has  two  or  more  elements  and  that  u  >4.  In  this 
case,  we  will  have  to  delete  an  element  from  a  cluster.  The  element  we  delete  from 
a  cluster  might  not  be  x,  however,  because  if  x  equals  min,  then  once  we  have 
deleted  x,  some  other  element  within  one  of  F’s  clusters  becomes  the  new  min, 
and  we  have  to  delete  that  other  element  from  its  cluster.  If  the  test  in  line  9  reveals 
that  we  are  in  this  case,  then  line  10  sets  first-cluster  to  the  number  of  the  cluster 
that  contains  the  lowest  element  other  than  min,  and  line  1 1  sets  x  to  the  value  of 
the  lowest  element  in  that  cluster.  This  element  becomes  the  new  min  in  line  12 
and,  because  we  set  x  to  its  value,  it  is  the  element  that  will  be  deleted  from  its 
cluster. 

When  we  reach  line  13,  we  know  that  we  need  to  delete  element  x  from  its 
cluster,  whether  x  was  the  value  originally  passed  to  vEB -Tree-Delete  or  x 
is  the  element  becoming  the  new  minimum.  Line  13  deletes  x  from  its  cluster. 
That  cluster  might  now  become  empty,  which  line  14  tests,  and  if  it  does,  then 
we  need  to  remove  x’s  cluster  number  from  the  summary,  which  line  15  handles. 
After  updating  the  summary,  we  might  need  to  update  max.  Line  16  checks  to  see 
whether  we  are  deleting  the  maximum  element  in  V  and,  if  we  are,  then  line  17  sets 
summary-max  to  the  number  of  the  highest-numbered  nonempty  cluster.  (The  call 
vEB -Tree-Maximum (V. summary)  works  because  we  have  already  recursively 
called  vEB -Tree-Delete  on  V. summary,  and  therefore  V. summary  .max  has  al¬ 
ready  been  updated  as  necessary.)  If  all  of  F’s  clusters  are  empty,  then  the  only 
remaining  element  in  F  is  rain;  line  18  checks  for  this  case,  and  line  19  updates 
max  appropriately.  Otherwise,  line  20  sets  max  to  the  maximum  element  in  the 
highest-numbered  cluster.  (If  this  cluster  is  where  the  element  has  been  deleted, 
we  again  rely  on  the  recursive  call  in  line  13  having  already  corrected  that  cluster’s 
max  attribute.) 

Linally,  we  have  to  handle  the  case  in  which  x’s  cluster  did  not  become  empty 
due  to  x  being  deleted.  Although  we  do  not  have  to  update  the  summary  in  this 
case,  we  might  have  to  update  max.  Line  21  tests  for  this  case,  and  if  we  have  to 
update  max,  line  22  does  so  (again  relying  on  the  recursive  call  to  have  corrected 
max  in  the  cluster). 

Now  we  show  that  vEB -Tree-Delete  runs  in  0(\g  lg  u  )  time  in  the  worst 
case.  At  first  glance,  you  might  think  that  recurrence  (20.4)  does  not  always  apply, 
because  a  single  call  of  vEB -Tree-Delete  can  make  two  recursive  calls:  one 
on  line  13  and  one  on  line  15.  Although  the  procedure  can  make  both  recursive 
calls,  let’s  think  about  what  happens  when  it  does.  In  order  for  the  recursive  call  on 
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line  15  to  occur,  the  test  on  line  14  must  show  that  x’s  cluster  is  empty.  The  only 
way  that  x’s  cluster  can  be  empty  is  if  x  was  the  only  element  in  its  cluster  when 
we  made  the  recursive  call  on  line  13.  But  if  x  was  the  only  element  in  its  cluster, 
then  that  recursive  call  took  0(1)  time,  because  it  executed  only  lines  1-3.  Thus, 
we  have  two  mutually  exclusive  possibilities: 

•  The  recursive  call  on  line  13  took  constant  time. 

•  The  recursive  call  on  line  15  did  not  occur. 

In  either  case,  recurrence  (20.4)  characterizes  the  running  time  of  vEB -Tree- 
Delete,  and  hence  its  worst-case  running  time  is  0(lg  lg  u). 

Exercises 


20.3-1 

Modify  vEB  trees  to  support  duplicate  keys. 


20.3-2 

Modify  vEB  trees  to  support  keys  that  have  associated  satellite  data. 


20.3-3 

Write  pseudocode  for  a  procedure  that  creates  an  empty  van  Emde  Boas  tree. 


20.3-4 

What  happens  if  you  call  vEB-Tree-Insert  with  an  element  that  is  already  in 
the  vEB  tree?  What  happens  if  you  call  vEB -Tree-Delete  with  an  element  that 
is  not  in  the  vEB  tree?  Explain  why  the  procedures  exhibit  the  behavior  that  they 
do.  Show  how  to  modify  vEB  trees  and  their  operations  so  that  we  can  check  in 
constant  time  whether  an  element  is  present. 


20.3- 5 

Suppose  that  instead  of  Xfii  clusters,  each  with  universe  size  k/u,  we  constructed 
vEB  trees  to  have  u''k  clusters,  each  with  universe  size  ul~l^k ,  where  k  >  1  is  a 
constant.  If  we  were  to  modify  the  operations  appropriately,  what  would  be  their 
running  times?  For  the  purpose  of  analysis,  assume  that  n^k  and  ul~l/k  are  always 
integers. 

20.3- 6 

Creating  a  vEB  tree  with  universe  size  u  requires  0(u)  time.  Suppose  we  wish  to 
explicitly  account  for  that  time.  What  is  the  smallest  number  of  operations  n  for 
which  the  amortized  time  of  each  operation  in  a  vEB  tree  is  0(lg  lg  u)l 
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Problems 


20-1  Space  requirements  for  van  Emde  Boas  trees 

This  problem  explores  the  space  requirements  for  van  Emde  Boas  trees  and  sug¬ 
gests  a  way  to  modify  the  data  structure  to  make  its  space  requirement  depend  on 
the  number  n  of  elements  actually  stored  in  the  tree,  rather  than  on  the  universe 
size  u.  For  simplicity,  assume  that  s/u  is  always  an  integer. 

a.  Explain  why  the  following  recurrence  characterizes  the  space  requirement  P(u) 
of  a  van  Emde  Boas  tree  with  universe  size  u : 

P(u)  =  {-fu  +  l)P(Vw)  +  0(Vm)  •  (20.5) 

b.  Prove  that  recurrence  (20.5)  has  the  solution  P(u)  =  O(u). 

In  order  to  reduce  the  space  requirements,  let  us  define  a  reduced-space  van  Emde 
Boas  tree ,  or  RS-vEB  tree,  as  a  vEB  tree  V  but  with  the  following  changes: 

•  The  attribute  V.  cluster,  rather  than  being  stored  as  a  simple  array  of  pointers  to 
vEB  trees  with  universe  size  y/u,  is  a  hash  table  (see  Chapter  11)  stored  as  a  dy¬ 
namic  table  (see  Section  17.4).  Corresponding  to  the  array  version  of  V.  cluster, 
the  hash  table  stores  pointers  to  RS-vEB  trees  with  universe  size  *Ju.  To  find 
the  z'th  cluster,  we  look  up  the  key  i  in  the  hash  table,  so  that  we  can  find  the 
zth  cluster  by  a  single  search  in  the  hash  table. 

•  The  hash  table  stores  only  pointers  to  nonempty  clusters.  A  search  in  the  hash 
table  for  an  empty  cluster  returns  NIL,  indicating  that  the  cluster  is  empty. 

•  The  attribute  V.  summary  is  NIL  if  all  clusters  are  empty.  Otherwise,  V.  summary 
points  to  an  RS-vEB  tree  with  universe  size  -fu. 

Because  the  hash  table  is  implemented  with  a  dynamic  table,  the  space  it  requires 
is  proportional  to  the  number  of  nonempty  clusters. 

When  we  need  to  insert  an  element  into  an  empty  RS-vEB  tree,  we  create  the  RS- 
vEB  tree  by  calling  the  following  procedure,  where  the  parameter  u  is  the  universe 
size  of  the  RS-vEB  tree: 

Create-New-RS-vEB-Tree(u) 

1  allocate  a  new  vEB  tree  V 

2  V.u  =  u 

3  V.min  =  NIL 

4  V.max  =  NIL 

5  V.  summary  =  NIL 

6  create  V.  cluster  as  an  empty  dynamic  hash  table 

7  return  V 
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c.  Modify  the  vEB-Tree-Insert  procedure  to  produce  pseudocode  for  the  pro¬ 
cedure  RS-vEB-Tree-Insert(E,  x),  which  inserts  x  into  the  RS-vEB  tree  V, 
calling  Create-New-RS-vEB-Tree  as  appropriate. 

d.  Modify  the  vEB-Tree-Successor  procedure  to  produce  pseudocode  for 
the  procedure  RS-vEB-Tree-Successor(E,  x),  which  returns  the  successor 
of  x  in  RS-vEB  tree  V,  or  NIL  if  x  has  no  successor  in  V. 

e.  Prove  that,  under  the  assumption  of  simple  uniform  hashing,  your  RS-vEB- 
Tree-Insert  and  RS-vEB-Tree-Successor  procedures  run  in  0(lglgu) 
expected  time. 

/.  Assuming  that  elements  are  never  deleted  from  a  vEB  tree,  prove  that  the  space 
requirement  for  the  RS-vEB  tree  structure  is  O(n),  where  n  is  the  number  of 
elements  actually  stored  in  the  RS-vEB  tree. 

g.  RS-vEB  trees  have  another  advantage  over  vEB  trees:  they  require  less  time  to 
create.  How  long  does  it  take  to  create  an  empty  RS-vEB  tree? 

20-2  y  -fast  tries 

This  problem  investigates  D.  Willard’s  “y-fast  tries”  which,  like  van  Emde  Boas 
trees,  perform  each  of  the  operations  Member,  Minimum,  Maximum,  Pre¬ 
decessor,  and  SUCCESSOR  on  elements  drawn  from  a  universe  with  size  u  in 
0(lg  lg  u)  worst-case  time.  The  Insert  and  Delete  operations  take  O(lglgw) 
amortized  time.  Like  reduced-space  van  Emde  Boas  trees  (see  Problem  20-1),  y- 
fast  tries  use  only  0(n )  space  to  store  n  elements.  The  design  of  y-fast  tries  relies 
on  perfect  hashing  (see  Section  11.5). 

As  a  preliminary  structure,  suppose  that  we  create  a  perfect  hash  table  containing 
not  only  every  element  in  the  dynamic  set,  but  every  prefix  of  the  binary  represen¬ 
tation  of  every  element  in  the  set.  For  example,  if  u  =  16,  so  that  lg  u  =  4,  and 
x  =  13  is  in  the  set,  then  because  the  binary  representation  of  13  is  1101,  the 
perfect  hash  table  would  contain  the  strings  1,  11,  110,  and  1101.  In  addition  to 
the  hash  table,  we  create  a  doubly  linked  list  of  the  elements  currently  in  the  set,  in 
increasing  order. 

a.  How  much  space  does  this  structure  require? 

b.  Show  how  to  perform  the  Minimum  and  Maximum  operations  in  0(1)  time; 
the  Member,  Predecessor,  and  Successor  operations  in  O(lglgw)  time; 
and  the  Insert  and  Delete  operations  in  O(lgw)  time. 

To  reduce  the  space  requirement  to  0(n),  we  make  the  following  changes  to  the 
data  structure: 
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•  We  cluster  the  n  elements  into  nj  Ig  u  groups  of  size  Ig  u.  (Assume  for  now 
that  lg  u  divides  n .)  The  first  group  consists  of  the  lg  u  smallest  elements  in  the 
set,  the  second  group  consists  of  the  next  lg  u  smallest  elements,  and  so  on. 

•  We  designate  a  “representative”  value  for  each  group.  The  representative  of 
the  i  th  group  is  at  least  as  large  as  the  largest  element  in  the  i  th  group,  and  it  is 
smaller  than  every  element  of  the  (i  +  l)st  group.  (The  representative  of  the  last 
group  can  be  the  maximum  possible  element  u  —  1.)  Note  that  a  representative 
might  be  a  value  not  currently  in  the  set. 

•  We  store  the  lg  u  elements  of  each  group  in  a  balanced  binary  search  tree,  such 
as  a  red-black  tree.  Each  representative  points  to  the  balanced  binary  search 
tree  for  its  group,  and  each  balanced  binary  search  tree  points  to  its  group’s 
representative. 

•  The  perfect  hash  table  stores  only  the  representatives,  which  are  also  stored  in 
a  doubly  linked  list  in  increasing  order. 

We  call  this  structure  a  y  -fast  trie. 

c.  Show  that  a  y-fast  trie  requires  only  0(n )  space  to  store  n  elements. 

d.  Show  how  to  perform  the  Minimum  and  Maximum  operations  in  0(]g  Ig  u) 
time  with  a  y-fast  trie. 

e.  Show  how  to  perform  the  Member  operation  in  0(lg  lg  u)  time. 

/.  Show  how  to  perform  the  PREDECESSOR  and  SUCCESSOR  operations  in 
0(lg  lgw)  time. 

g.  Explain  why  the  Insert  and  Delete  operations  take  E!(lg  lg  u)  time. 

h.  Show  how  to  relax  the  requirement  that  each  group  in  a  y-fast  trie  has  exactly 
lg  u  elements  to  allow  Insert  and  Delete  to  run  in  0( lg  lg  u)  amortized  time 
without  affecting  the  asymptotic  running  times  of  the  other  operations. 


Chapter  notes 

The  data  structure  in  this  chapter  is  named  after  P.  van  Emde  Boas,  who  described 
an  early  form  of  the  idea  in  1975  [339].  Later  papers  by  van  Emde  Boas  [340] 
and  van  Emde  Boas,  Kaas,  and  Zijlstra  [341]  refined  the  idea  and  the  exposition. 
Mehlhorn  and  Naher  [252]  subsequently  extended  the  ideas  to  apply  to  universe 
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sizes  that  are  prime.  Mehlhorn’s  book  [249]  contains  a  slightly  different  treatment 
of  van  Emde  Boas  trees  than  the  one  in  this  chapter. 

Using  the  ideas  behind  van  Emde  Boas  trees,  Dementiev  et  al.  [83]  developed 
a  nonrecursive,  three-level  search  tree  that  ran  faster  than  van  Emde  Boas  trees  in 
their  own  experiments. 

Wang  and  Lin  [347]  designed  a  hardware-pipelined  version  of  van  Emde  Boas 
trees,  which  achieves  constant  amortized  time  per  operation  and  uses  Dflglgw) 
stages  in  the  pipeline. 

A  lower  bound  by  Pat  rase  u  and  Thorup  [273,  274]  for  finding  the  predecessor 
shows  that  van  Emde  Boas  trees  are  optimal  for  this  operation,  even  if  randomiza¬ 
tion  is  allowed. 
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Data  Structures  for  Disjoint  Sets 


Some  applications  involve  grouping  n  distinct  elements  into  a  collection  of  disjoint 
sets.  These  applications  often  need  to  perform  two  operations  in  particular:  finding 
the  unique  set  that  contains  a  given  element  and  uniting  two  sets.  This  chapter 
explores  methods  for  maintaining  a  data  structure  that  supports  these  operations. 

Section  21.1  describes  the  operations  supported  by  a  disjoint-set  data  structure 
and  presents  a  simple  application.  In  Section  21.2,  we  look  at  a  simple  linked-list 
implementation  for  disjoint  sets.  Section  21.3  presents  a  more  efficient  represen¬ 
tation  using  rooted  trees.  The  running  time  using  the  tree  representation  is  theo¬ 
retically  su perl  inear,  but  for  all  practical  purposes  it  is  linear.  Section  21.4  defines 
and  discusses  a  very  quickly  growing  function  and  its  very  slowly  growing  inverse, 
which  appears  in  the  running  time  of  operations  on  the  tree-based  implementation, 
and  then,  by  a  complex  amortized  analysis,  proves  an  upper  bound  on  the  running 
time  that  is  just  barely  superlinear. 


21.1  Disjoint-set  operations 

A  disjoint-set  data  structure  maintains  a  collection  S  =  {Sj ,  .S2, . . . ,  Sk  j  of  dis¬ 
joint  dynamic  sets.  We  identify  each  set  by  a  representative,  which  is  some  mem¬ 
ber  of  the  set.  In  some  applications,  it  doesn’t  matter  which  member  is  used  as  the 
representative;  we  care  only  that  if  we  ask  for  the  representative  of  a  dynamic  set 
twice  without  modifying  the  set  between  the  requests,  we  get  the  same  answer  both 
times.  Other  applications  may  require  a  prespecified  rule  for  choosing  the  repre¬ 
sentative,  such  as  choosing  the  smallest  member  in  the  set  (assuming,  of  course, 
that  the  elements  can  be  ordered). 

As  in  the  other  dynamic-set  implementations  we  have  studied,  we  represent  each 
element  of  a  set  by  an  object.  Letting  x  denote  an  object,  we  wish  to  support  the 
following  operations: 


562 


Chapter  21  Data  Structures  for  Disjoint  Sets 


Make-Set(x)  creates  a  new  set  whose  only  member  (and  thus  representative) 
is  x.  Since  the  sets  are  disjoint,  we  require  that  x  not  already  be  in  some  other 
set. 

Union(x,  y)  unites  the  dynamic  sets  that  contain  x  and  y,  say  Sx  and  Sy,  into  a 
new  set  that  is  the  union  of  these  two  sets.  We  assume  that  the  two  sets  are  dis¬ 
joint  prior  to  the  operation.  The  representative  of  the  resulting  set  is  any  member 
of  Sx  U  Sy,  although  many  implementations  of  UNION  specifically  choose  the 
representative  of  either  Sx  or  Sy  as  the  new  representative.  Since  we  require 
the  sets  in  the  collection  to  be  disjoint,  conceptually  we  destroy  sets  Sx  and  Sy, 
removing  them  from  the  collection  S.  In  practice,  we  often  absorb  the  elements 
of  one  of  the  sets  into  the  other  set. 

Find-Set  (x)  returns  a  pointer  to  the  representative  of  the  (unique)  set  contain¬ 
ing  x. 

Throughout  this  chapter,  we  shall  analyze  the  running  times  of  disjoint-set  data 
structures  in  terms  of  two  parameters:  n,  the  number  of  Make-Set  operations, 
and  m,  the  total  number  of  Make-Set,  Union,  and  Find-Set  operations.  Since 
the  sets  are  disjoint,  each  UNION  operation  reduces  the  number  of  sets  by  one. 
After  n  —  1  UNION  operations,  therefore,  only  one  set  remains.  The  number  of 
Union  operations  is  thus  at  most  n  —  1.  Note  also  that  since  the  Make-Set 
operations  are  included  in  the  total  number  of  operations  m,  we  have  m  >  n.  We 
assume  that  the  n  Make-Set  operations  are  the  first  n  operations  performed. 

An  application  of  disjoint-set  data  structures 

One  of  the  many  applications  of  disjoint-set  data  structures  arises  in  determin¬ 
ing  the  connected  components  of  an  undirected  graph  (see  Section  B.4).  Fig¬ 
ure  21.1(a),  for  example,  shows  a  graph  with  four  connected  components. 

The  procedure  CONNECTED-COMPONENTS  that  follows  uses  the  disjoint-set 
operations  to  compute  the  connected  components  of  a  graph.  Once  CONNECTED- 
COMPONENTS  has  preprocessed  the  graph,  the  procedure  Same-Component 
answers  queries  about  whether  two  vertices  are  in  the  same  connected  component. 1 
(In  pseudocode,  we  denote  the  set  of  vertices  of  a  graph  G  by  G.  V  and  the  set  of 
edges  by  G.E.) 


1When  the  edges  of  the  graph  are  static  not  changing  over  time  we  can  compute  the  connected 
components  faster  by  using  depth  first  search  (Exercise  22.3  12).  Sometimes,  however,  the  edges 
are  added  dynamically  and  we  need  to  maintain  the  connected  components  as  each  edge  is  added.  In 
this  case,  the  implementation  given  here  can  be  more  efficient  than  running  a  new  depth  first  search 
for  each  new  edge. 


21.1  Disjoint  set  operations 
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Figure  21.1  (a)  A  graph  with  four  connected  components:  {a,b,c,d},  {e,  f.  g),  {h,  i },  and  {j }. 

(b)  The  collection  of  disjoint  sets  after  processing  each  edge. 


Connected-Components  (G) 

1  for  each  vertex  v  e  G.  V 

2  Make-Set(v) 

3  for  each  edge  (w,  v)  e  G.E 

4  if  Find-Set(w)  ±  Find-Set(v) 

5  Union(m,v) 

Same-Component(m,  V ) 

1  if  Find-Set(w)  ==  Find-Set(v) 

2  return  TRUE 

3  else  return  FALSE 

The  procedure  CONNECTED-COMPONENTS  initially  places  each  vertex  v  in  its 
own  set.  Then,  for  each  edge  (u,  v),  it  unites  the  sets  containing  u  and  v.  By 
Exercise  21.1-2,  after  processing  all  the  edges,  two  vertices  are  in  the  same  con¬ 
nected  component  if  and  only  if  the  corresponding  objects  are  in  the  same  set. 
Thus,  Connected-Components  computes  sets  in  such  a  way  that  the  proce¬ 
dure  Same-Component  can  determine  whether  two  vertices  are  in  the  same  con- 
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nected  component.  Figure  21.1(b)  illustrates  how  CONNECTED-COMPONENTS 
computes  the  disjoint  sets. 

In  an  actual  implementation  of  this  connected-components  algorithm,  the  repre¬ 
sentations  of  the  graph  and  the  disjoint-set  data  structure  would  need  to  reference 
each  other.  That  is,  an  object  representing  a  vertex  would  contain  a  pointer  to 
the  corresponding  disjoint-set  object,  and  vice  versa.  These  programming  details 
depend  on  the  implementation  language,  and  we  do  not  address  them  further  here. 

Exercises 


21.1-1 

Suppose  that  CONNECTED-COMPONENTS  is  run  on  the  undirected  graph  G  = 
(V,  E),  where  V  =  {a,b,c,d,e,  fg,h,i,  j,k}  and  the  edges  of  E  are  pro¬ 
cessed  in  the  order  (d,i),  ( f,k ),  (g,  i),  (b,  g),  (a,  h),  (i,  j),  (d,  k),  (b,  j),  (d,  f), 
(g,  j),  (a,  e).  List  the  vertices  in  each  connected  component  after  each  iteration  of 
lines  3-5. 


21.1-2 

Show  that  after  all  edges  are  processed  by  CONNECTED-COMPONENTS,  two  ver¬ 
tices  are  in  the  same  connected  component  if  and  only  if  they  are  in  the  same  set. 


21.1-3 

During  the  execution  of  CONNECTED-COMPONENTS  on  an  undirected  graph  G  = 
(V,  E)  with  k  connected  components,  how  many  times  is  Find-Set  called?  How 
many  times  is  UNION  called?  Express  your  answers  in  terms  of  \V\,  \E\,  and  k. 


21.2  Linked-list  representation  of  disjoint  sets 

Figure  21.2(a)  shows  a  simple  way  to  implement  a  disjoint-set  data  structure:  each 
set  is  represented  by  its  own  linked  list.  The  object  for  each  set  has  attributes  head, 
pointing  to  the  first  object  in  the  list,  and  tail,  pointing  to  the  last  object.  Each 
object  in  the  list  contains  a  set  member,  a  pointer  to  the  next  object  in  the  list,  and 
a  pointer  back  to  the  set  object.  Within  each  linked  list,  the  objects  may  appeal-  in 
any  order.  The  representative  is  the  set  member  in  the  first  object  in  the  list. 

With  this  linked-list  representation,  both  Make-Set  and  Find-Set  are  easy, 
requiring  0(1)  time.  To  carry  out  Make-Set(x),  we  create  a  new  linked  list 
whose  only  object  is  x.  For  Find-Set(x),  we  just  follow  the  pointer  from  x  back 
to  its  set  object  and  then  return  the  member  in  the  object  that  head  points  to.  For 
example,  in  Figure  21.2(a),  the  call  FlND-SET(g)  would  return  /. 


21.2  Linked  list  representation  of  disjoint  sets 
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Figure  21.2  (a)  Linked  list  representations  of  two  sets.  Set  Si  contains  members  d,  /,  and  g,  with 
representative  /,  and  set  S2  contains  members  b,  c ,  e,  and  h,  with  representative  c.  Each  object  in 
the  list  contains  a  set  member,  a  pointer  to  the  next  object  in  the  list,  and  a  pointer  back  to  the  set 
object.  Each  set  object  has  pointers  head  and  tail  to  the  first  and  last  objects,  respectively,  (b)  The 
result  ofUNlON(g,  e),  which  appends  the  linked  list  containing  e  to  the  linked  list  containing  g.  The 
representative  of  the  resulting  set  is  /.  The  set  object  for  e's  list,  S2,  is  destroyed. 

A  simple  implementation  of  union 

The  simplest  implementation  of  the  Union  operation  using  the  linked-list  set  rep¬ 
resentation  takes  significantly  more  time  than  Make-Set  or  FlND-SET.  As  Fig¬ 
ure  21.2(b)  shows,  we  perform  UNlON(*,y)  by  appending  y’s  list  onto  the  end 
of  a’s  list.  The  representative  of  x’s  list  becomes  the  representative  of  the  resulting 
set.  We  use  the  tail  pointer  for  x ’s  list  to  quickly  find  where  to  append  y’s  list.  Be¬ 
cause  all  members  of  y’s  list  join  x\ list,  we  can  destroy  the  set  object  for  y’s  list. 
Unfortunately,  we  must  update  the  pointer  to  the  set  object  for  each  object  origi¬ 
nally  on  y’s  list,  which  takes  time  linear  in  the  length  of  y’s  list.  In  Figure  21.2,  for 
example,  the  operation  UNION (g,  e)  causes  pointers  to  be  updated  in  the  objects 
for  b,  c,  e,  and  h. 

In  fact,  we  can  easily  construct  a  sequence  of  m  operations  on  n  objects  that 
requires  0(/i2)  time.  Suppose  that  we  have  objects  Xi.x2, .... xn.  We  execute 
the  sequence  of  n  Make-Set  operations  followed  by  n  —  1  UNION  operations 
shown  in  Figure  21.3,  so  that  m  =  2n  —  \.  We  spend  0(/j)  time  performing  the  n 
Make-Set  operations.  Because  the  ith  UNION  operation  updates  i  objects,  the 
total  number  of  objects  updated  by  all  n  —  1  Union  operations  is 
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Operation  Number  of  objects  updated 

Make  Set(xi)  1 

Make  Set(x2)  1 


Make  Set(x„)  1 

UNION(X2.Xi)  1 

UNION(X3,X2)  2 

UNI0N(X4,  X3)  3 


UNION(X„,X„_l)  77  —  1 


Figure  21.3  A  sequence  of  2 n  —  1  operations  on  n  objects  that  takes  0(/i2)  time,  or  0(«)  time 
per  operation  on  average,  using  the  linked  list  set  representation  and  the  simple  implementation  of 
Union. 


=  ®(«2)  ■ 

i  =  l 

The  total  number  of  operations  is  2n  —  1 ,  and  so  each  operation  on  average  requires 
0(/i)  time.  That  is,  the  amortized  time  of  an  operation  is  0(/i). 

A  weighted-union  heuristic 

In  the  worst  case,  the  above  implementation  of  the  UNION  procedure  requires  an 
average  of  ©(«)  time  per  call  because  we  may  be  appending  a  longer  list  onto 
a  shorter  list;  we  must  update  the  pointer  to  the  set  object  for  each  member  of 
the  longer  list.  Suppose  instead  that  each  list  also  includes  the  length  of  the  list 
(which  we  can  easily  maintain)  and  that  we  always  append  the  shorter  list  onto  the 
longer,  breaking  ties  arbitrarily.  With  this  simple  weighted-union  heuristic ,  a  sin¬ 
gle  Union  operation  can  still  take  £2(«)  time  if  both  sets  have  Q(n)  members.  As 
the  following  theorem  shows,  however,  a  sequence  of  m  Make-Set,  Union,  and 
Find-Set  operations,  n  of  which  are  Make-Set  operations,  takes  0(m  +  n  Ig  n) 
time. 

Theorem  21.1 

Using  the  linked-list  representation  of  disjoint  sets  and  the  weighted-union  heuris¬ 
tic,  a  sequence  of  m  Make-Set,  Union,  and  Find-Set  operations,  n  of  which 
are  Make-Set  operations,  takes  0(m  +  n  lg/j)  time. 


21.2  Linked  list  representation  of  disjoint  sets 
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Proof  Because  each  UNION  operation  unites  two  disjoint  sets,  we  perform  at 
most  77  —  1  UNION  operations  over  all.  We  now  bound  the  total  time  taken  by  these 
Union  operations.  We  start  by  determining,  for  each  object,  an  upper  bound  on  the 
number  of  times  the  object’s  pointer  back  to  its  set  object  is  updated.  Consider  a 
particular  object  x.  We  know  that  each  time  x’s  pointer  was  updated,  x  must  have 
started  in  the  smaller  set.  The  first  time  x’s  pointer  was  updated,  therefore,  the 
resulting  set  must  have  had  at  least  2  members.  Similarly,  the  next  time  x’s  pointer 
was  updated,  the  resulting  set  must  have  had  at  least  4  members.  Continuing  on, 
we  observe  that  for  any  k  <  n,  after  x’s  pointer  has  been  updated  [~lg k]  times, 
the  resulting  set  must  have  at  least  k  members.  Since  the  largest  set  has  at  most  n 
members,  each  object’s  pointer  is  updated  at  most  [lg/;]  times  over  all  the  UNION 
operations.  Thus  the  total  time  spent  updating  object  pointers  over  all  UNION 
operations  is  0(/7lg/7).  We  must  also  account  for  updating  the  tail  pointers  and 
the  list  lengths,  which  take  only  0(1)  time  per  UNION  operation.  The  total  time 
spent  in  all  UNION  operations  is  thus  0(n  lg  n). 

The  time  for  the  entire  sequence  of  m  operations  follows  easily.  Each  Make- 
Set  and  Find-Set  operation  takes  0(1)  time,  and  there  are  0(m)  of  them.  The 
total  time  for  the  entire  sequence  is  thus  0{m  +  n  lg  n).  m 

Exercises 


21.2-1 

Write  pseudocode  for  Make-Set,  Find-Set,  and  Union  using  the  linked-list 
representation  and  the  weighted-union  heuristic.  Make  sure  to  specify  the  attributes 
that  you  assume  for  set  objects  and  list  objects. 


21.2-2 

Show  the  data  structure  that  results  and  the  answers  returned  by  the  Find-Set 
operations  in  the  following  program.  Use  the  linked-list  representation  with  the 
weighted-union  heuristic. 

1  for  i  =  1  to  1 6 

2  Make-Set(x,-) 

3  for  /  =  1  to  15  by  2 

4  UNION(x1-,X;  +  i) 

5  for  i  =  1  to  1 3  by  4 

6  UNION(x1-,X;+2) 

7  Union(xi,x5) 

8  Union(xh,Xi3) 

9  Union(xi,xio) 

10  Find-Set(x2) 

11  Find-Set(x9) 
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Assume  that  if  the  sets  containing  xt  and  Xj  have  the  same  size,  then  the  operation 
UNION  (x,-  ,  Xj )  appends  Xj ’s  list  onto  x, ’s  list. 


21.2- 3 

Adapt  the  aggregate  proof  of  Theorem  21.1  to  obtain  amortized  time  bounds 
of  0(1)  for  Make-Set  and  Find-Set  and  0(lgn)  for  Union  using  the  linked- 
list  representation  and  the  weighted-union  heuristic. 

21.2- 4 

Give  a  tight  asymptotic  bound  on  the  running  time  of  the  sequence  of  operations  in 
Figure  21.3  assuming  the  linked-list  representation  and  the  weighted-union  heuris¬ 
tic. 


21.2- 5 

Professor  Gompers  suspects  that  it  might  be  possible  to  keep  just  one  pointer  in 
each  set  object,  rather  than  two  (head  and  tail),  while  keeping  the  number  of  point¬ 
ers  in  each  list  element  at  two.  Show  that  the  professor’s  suspicion  is  well  founded 
by  describing  how  to  represent  each  set  by  a  linked  list  such  that  each  operation 
has  the  same  running  time  as  the  operations  described  in  this  section.  Describe 
also  how  the  operations  work.  Your  scheme  should  allow  for  the  weighted-union 
heuristic,  with  the  same  effect  as  described  in  this  section.  (Hint:  Use  the  tail  of  a 
linked  list  as  its  set’s  representative.) 

21.2- 6 

Suggest  a  simple  change  to  the  UNION  procedure  for  the  linked-list  representation 
that  removes  the  need  to  keep  the  tail  pointer  to  the  last  object  in  each  list.  Whether 
or  not  the  weighted-union  heuristic  is  used,  your  change  should  not  change  the 
asymptotic  running  time  of  the  UNION  procedure.  (Hint:  Rather  than  appending 
one  list  to  another,  splice  them  together.) 


21.3  Disjoint-set  forests 

In  a  faster  implementation  of  disjoint  sets,  we  represent  sets  by  rooted  trees,  with 
each  node  containing  one  member  and  each  tree  representing  one  set.  In  a  disjoint- 
set  forest ,  illustrated  in  Figure  21.4(a),  each  member  points  only  to  its  parent.  The 
root  of  each  tree  contains  the  representative  and  is  its  own  parent.  As  we  shall 
see,  although  the  straightforward  algorithms  that  use  this  representation  are  no 
faster  than  ones  that  use  the  linked-list  representation,  by  introducing  two  heuris¬ 
tics— “union  by  rank”  and  “path  compression”— we  can  achieve  an  asymptotically 
optimal  disjoint-set  data  structure. 


21.3  Disjoint  set  forests 
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Figure  21.4  A  disjoint  set  forest,  (a)  Two  trees  representing  the  two  sets  of  Figure  21.2.  The 
tree  on  the  left  represents  the  set  {b,  c,  e,  h),  with  c  as  the  representative,  and  the  tree  on  the  right 
represents  the  set  {d,f,g},  with  /  as  the  representative,  (b)  The  result  of  UNlON(e,g). 

We  perform  the  three  disjoint-set  operations  as  follows.  A  Make- Set  operation 
simply  creates  a  tree  with  just  one  node.  We  perform  a  Find-Set  operation  by 
following  parent  pointers  until  we  find  the  root  of  the  tree.  The  nodes  visited  on 
this  simple  path  toward  the  root  constitute  the  find  path.  A  UNION  operation, 
shown  in  Figure  21.4(b),  causes  the  root  of  one  tree  to  point  to  the  root  of  the  other. 

Heuristics  to  improve  the  running  time 

So  far,  we  have  not  improved  on  the  linked-list  implementation.  A  sequence  of 
n  —  1  Union  operations  may  create  a  tree  that  is  just  a  linear  chain  of  n  nodes.  By 
using  two  heuristics,  however,  we  can  achieve  a  running  time  that  is  almost  linear 
in  the  total  number  of  operations  m. 

The  first  heuristic,  union  by  rank,  is  similar  to  the  weighted-union  heuristic  we 
used  with  the  linked-list  representation.  The  obvious  approach  would  be  to  make 
the  root  of  the  tree  with  fewer  nodes  point  to  the  root  of  the  tree  with  more  nodes. 
Rather  than  explicitly  keeping  track  of  the  size  of  the  subtree  rooted  at  each  node, 
we  shall  use  an  approach  that  eases  the  analysis.  For  each  node,  we  maintain  a 
rank,  which  is  an  upper  bound  on  the  height  of  the  node.  In  union  by  rank,  we 
make  the  root  with  smaller  rank  point  to  the  root  with  larger  rank  during  a  UNION 
operation. 

The  second  heuristic,  path  compression,  is  also  quite  simple  and  highly  effec¬ 
tive.  As  shown  in  Figure  21.5,  we  use  it  during  FlND-SET  operations  to  make  each 
node  on  the  find  path  point  directly  to  the  root.  Path  compression  does  not  change 
any  ranks. 
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Figure  21.5  Path  compression  during  the  operation  Find  Set.  Arrows  and  self  loops  at  roots  are 
omitted,  (a)  A  tree  representing  a  set  prior  to  executing  Find  SET(a).  Triangles  represent  subtrees 
whose  roots  are  the  nodes  shown.  Each  node  has  a  pointer  to  its  parent,  (b)  The  same  set  after 
executing  FIND  SET(a).  Each  node  on  the  find  path  now  points  directly  to  the  root. 

Pseudocode  for  disjoint-set  forests 

To  implement  a  disjoint-set  forest  with  the  union-by-rank  heuristic,  we  must  keep 
track  of  ranks.  With  each  node  x,  we  maintain  the  integer  value  x.rank,  which  is 
an  upper  bound  on  the  height  of  x  (the  number  of  edges  in  the  longest  simple  path 
between  x  and  a  descendant  leaf).  When  Make-Set  creates  a  singleton  set,  the 
single  node  in  the  corresponding  tree  has  an  initial  rank  of  0.  Each  Find-Set  oper¬ 
ation  leaves  all  ranks  unchanged.  The  Union  operation  has  two  cases,  depending 
on  whether  the  roots  of  the  trees  have  equal  rank.  If  the  roots  have  unequal  rank, 
we  make  the  root  with  higher  rank  the  parent  of  the  root  with  lower  rank,  but  the 
ranks  themselves  remain  unchanged.  If,  instead,  the  roots  have  equal  ranks,  we 
arbitrarily  choose  one  of  the  roots  as  the  parent  and  increment  its  rank. 

Let  us  put  this  method  into  pseudocode.  We  designate  the  parent  of  node  x 
by  x.p.  The  Link  procedure,  a  subroutine  called  by  UNION,  takes  pointers  to  two 
roots  as  inputs. 
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Make-Set(x) 

1  X.p  =  X 

2  x.rank  =  0 

Union(x,  y) 

1  Link(Find-Set(x),  Find-Set  (y)) 

Link(x,j) 

1  if  x.rank  >  y  .rank 

2  y.p  =  x 

3  else  x.p  =  y 

4  if  x .  rank  ==  y.  rank 

5  y  .rank  =  y .  rank  +  1 

The  Find -Set  procedure  with  path  compression  is  quite  simple: 

Find-Set(x) 

1  ifx  /  x.p 

2  x.p  =  Find-Set(x.p) 

3  return  x.p 

The  Find-Set  procedure  is  a  two-pass  method :  as  it  recurses,  it  makes  one  pass 
up  the  find  path  to  find  the  root,  and  as  the  recursion  unwinds,  it  makes  a  second 
pass  back  down  the  find  path  to  update  each  node  to  point  directly  to  the  root.  Each 
call  of  Find-Set(x)  returns  x.p  in  line  3.  If  x  is  the  root,  then  Find-Set  skips 
line  2  and  instead  returns  x.p,  which  is  x;  this  is  the  case  in  which  the  recursion 
bottoms  out.  Otherwise,  line  2  executes,  and  the  recursive  call  with  parameter  x.p 
returns  a  pointer  to  the  root.  Line  2  updates  node  x  to  point  directly  to  the  root, 
and  line  3  returns  this  pointer. 

Effect  of  the  heuristics  on  the  running  time 

Separately,  either  union  by  rank  or  path  compression  improves  the  running  time  of 
the  operations  on  disjoint-set  forests,  and  the  improvement  is  even  greater  when 
we  use  the  two  heuristics  together.  Alone,  union  by  rank  yields  a  running  time 
of  0(m  lg  n)  (see  Exercise  21.4-4),  and  this  bound  is  tight  (see  Exercise  21.3-3). 
Although  we  shall  not  prove  it  here,  for  a  sequence  of  n  Make-Set  opera¬ 
tions  (and  hence  at  most  n  —  1  Union  operations)  and  /  Find-Set  opera¬ 
tions,  the  path-compression  heuristic  alone  gives  a  worst-case  running  time  of 
0(/z  +  /  ■  (1  +  log2+//„  «)). 
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When  we  use  both  union  by  rank  and  path  compression,  the  worst-case  running 
time  is  0(m  a(n)),  where  a(n)  is  a  very  slowly  growing  function,  which  we  de¬ 
fine  in  Section  21.4.  In  any  conceivable  application  of  a  disjoint-set  data  structure, 
a {n)  <  4;  thus,  we  can  view  the  running  time  as  linear  in  m  in  all  practical  situa¬ 
tions.  Strictly  speaking,  however,  it  is  superlinear.  In  Section  21.4,  we  prove  this 
upper  bound. 

Exercises 


21.3-1 

Redo  Exercise  21.2-2  using  a  disjoint-set  forest  with  union  by  rank  and  path  com¬ 
pression. 


21.3-2 

Write  a  nonrecursive  version  of  Find-Set  with  path  compression. 


21.3-3 

Give  a  sequence  of  m  Make-Set,  Union,  and  Find-Set  operations,  n  of  which 
are  Make-Set  operations,  that  takes  Q(m  lg  n)  time  when  we  use  union  by  rank 
only. 


21.3- 4 

Suppose  that  we  wish  to  add  the  operation  Print-Set(x),  which  is  given  a  node  x 
and  prints  all  the  members  of  x’s  set,  in  any  order.  Show  how  we  can  add  just 
a  single  attribute  to  each  node  in  a  disjoint-set  forest  so  that  Print-Set(x)  takes 
time  linear  in  the  number  of  members  of  x ’s  set  and  the  asymptotic  running  times 
of  the  other  operations  are  unchanged.  Assume  that  we  can  print  each  member  of 
the  set  in  0(1)  time. 

21.3- 5  * 

Show  that  any  sequence  of  m  Make-Set,  Find-Set,  and  Fink  operations,  where 
all  the  Fink  operations  appear  before  any  of  the  Find-Set  operations,  takes  only 
0{m)  time  if  we  use  both  path  compression  and  union  by  rank.  What  happens  in 
the  same  situation  if  we  use  only  the  path-compression  heuristic? 
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As  noted  in  Section  21.3,  the  combined  union-by-rank  and  path-compression  heu¬ 
ristic  runs  in  time  0(m  a(n))  for  m  disjoint-set  operations  on  n  elements.  In  this 
section,  we  shall  examine  the  function  a  to  see  just  how  slowly  it  grows.  Then  we 
prove  this  running  time  using  the  potential  method  of  amortized  analysis. 


A  very  quickly  growing  function  and  its  very  slowly  growing  inverse 


For  integers  k  >  0  and  j  >  1,  we  define  the  function  A/fj )  as 


AkU) 


j  +  1  if  k  =  0  , 


where  the  expression  A^^(j)  uses  the  functional-iteration  notation  given  in  Sec¬ 
tion  3.2.  Specifically,  Af\(j)  =  j  and  A(‘]_fj)  =  A^A^U))  for  i  >  1. 
We  will  refer  to  the  parameter  k  as  the  level  of  the  function  A. 

The  function  A&(j )  strictly  increases  with  both  j  and  k.  To  see  just  how  quickly 
this  function  grows,  we  first  obtain  closed-form  expressions  for  Afj)  and  A2(j ). 


Lemma  21.2 

For  any  integer  j  >  1,  we  have  A  t  (  j )  =  2  j  +  1. 

Proof  We  first  use  induction  on  i  to  show  that  A  j,'  \j)  =  j  +  i .  For  the  base  case, 
we  have  Aq°\j)  =  j  =  j  +  0.  For  the  inductive  step,  assume  that  A(f  l,(./)  = 
j  +  (i  -  1).  Then  A^(j)  =  =  (j  +  (i  -  1))  +  1  =  j  +  i.  Finally, 

we  note  that  Afj)  =  A(f+X\j)  =  j  +  (j  +  1)  =  2 j  +  1.  ■ 

Lemma  21.3 

For  any  integer  j  >  1,  we  have  A 2(j  )  =  27  + 1  ( y  +  1)  —  1. 

Proof  We  first  use  induction  on  i  to  show  that  A (j ]  (j)  =  2'  (j  +  1)  —  1.  For 
the  base  case,  we  have  A^\j)  =  j  =  2°( j  +  1)  —  1.  For  the  inductive  step, 
assume  that  =  2,~1(/'  +  1)  —  1.  Then  A(.‘\j)  =  Ai(A,,-1V/))  = 

Aitf-'U  +  1)  -  1)  =  2- (2!_1  (j  + 1)— 1)  + 1  =  2‘  (j  + 1)— 2+  1  =  2!  (y +  1)  — 1. 
Finally,  we  note  that  A2(j)  =  A[ =  2 J+1(j  +  1)  —  1.  ■ 

Now  we  can  see  how  quickly  A k  (j )  grows  by  simply  examining  A*  ( 1)  for  levels 
k  =  0,  1,2,  3,4.  From  the  definition  of  A0(k)  and  the  above  lemmas,  we  have 
A0(l)  =  1  +  1  =  2,  Ai(l)  =  2  ■  1  +  1  =  3,  and  A2(l)  =  21+1  •  (1  +  1)  -  1  =  7. 
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We  also  have 

^3(1)  =  42)d) 

=  ^2(412(1)) 

=  A2(T) 

=  28  •  8  -  1 
=  211  —  1 
=  2047 

and 

a4w  =  42)(i) 

=  ^3  (^3(1)) 

=  ^4  3  (2047) 

=  42048)(20  47) 

»  ^2(2047) 

=  22048  ■  2048  -  1 

>  22048 
=  (24)512 
=  16s12 
»  1080  , 

which  is  the  estimated  number  of  atoms  in  the  observable  universe.  (The  symbol 
“3>”  denotes  the  “much-greater-than”  relation.) 

We  define  the  inverse  of  the  function  A  tin),  for  integer  n  >0,by 

ct(n)  —  min  {A:  :  ^4*(1)  >  n}  . 

In  words,  a(n)  is  the  lowest  level  k  for  which  A ]<(\ )  is  at  least  n.  From  the  above 
values  of  ^4*  ( 1 ),  we  see  that 

(0  for  0  <  n  <  2  , 

1  for  n  =  3  , 

a(n)  —  l  2  for  4  <  n  <  7  , 

I  3  for  8  <  n  <  2047  , 

[  4  for  2048  <  n  <  A4{\)  . 

It  is  only  for  values  of  n  so  large  that  the  term  “astronomical”  understates  them 
(greater  than  A4(] ),  a  huge  number)  that  a(n)  >  4,  and  so  a(n )  <  4  for  all 
practical  purposes. 
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Properties  of  ranks 

In  the  remainder  of  this  section,  we  prove  an  0(ma(n ))  bound  on  the  running  time 
of  the  disjoint-set  operations  with  union  by  rank  and  path  compression.  In  order  to 
prove  this  bound,  we  first  prove  some  simple  properties  of  ranks. 

Lemma  21.4 

For  all  nodes  x,  we  have  x.rank  <  x.p.rank,  with  strict  inequality  if  x  f  x.p. 
The  value  of  x.rank  is  initially  0  and  increases  through  time  until  x  ^  x.p\  from 
then  on,  x.rank  does  not  change.  The  value  of  x.p.rank  monotonically  increases 
over  time. 

Proof  The  proof  is  a  straightforward  induction  on  the  number  of  operations,  us¬ 
ing  the  implementations  of  Make-Set,  Union,  and  Find-Set  that  appear  in 
Section  21.3.  We  leave  it  as  Exercise  21.4-1.  ■ 

Corollary  21.5 

As  we  follow  the  simple  path  from  any  node  toward  a  root,  the  node  ranks  strictly 
increase.  ■ 


Lemma  21.6 

Every  node  has  rank  at  most  n  —  1 . 

Proof  Each  node’s  rank  starts  at  0,  and  it  increases  only  upon  Link  operations. 
Because  there  are  at  most  n  —  I  UNION  operations,  there  are  also  at  most  n  —  1 
Link  operations.  Because  each  Link  operation  either  leaves  all  ranks  alone  or 
increases  some  node’s  rank  by  1,  all  ranks  are  at  most  n  —  1.  ■ 

Lemma  21.6  provides  a  weak  bound  on  ranks.  In  fact,  every  node  has  rank  at 
most  [_lg  /?  J  (see  Exercise  21.4-2).  The  looser  bound  of  Lemma  21.6  will  suffice 
for  our  purposes,  however. 

Proving  the  time  bound 

We  shall  use  the  potential  method  of  amortized  analysis  (see  Section  17.3)  to  prove 
the  0(m  a(n ))  time  bound.  In  performing  the  amortized  analysis,  we  will  find  it 
convenient  to  assume  that  we  invoke  the  Link  operation  rather  than  the  UNION 
operation.  That  is,  since  the  parameters  of  the  Link  procedure  are  pointers  to  two 
roots,  we  act  as  though  we  perform  the  appropriate  Lind-Set  operations  sepa¬ 
rately.  The  following  lemma  shows  that  even  if  we  count  the  extra  Lind-Set  op¬ 
erations  induced  by  UNION  calls,  the  asymptotic  running  time  remains  unchanged. 
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Lemma  21.7 

Suppose  we  convert  a  sequence  S'  of  in'  Make-Set,  Union,  and  Find-Set  op¬ 
erations  into  a  sequence  S  of  m  Make-Set,  Link,  and  Find-Set  operations  by 
turning  each  Union  into  two  Find-Set  operations  followed  by  a  Link.  Then,  if 
sequence  S  runs  in  0{m  a(n))  time,  sequence  S'  runs  in  0(m'  a(n ))  time. 

Proof  Since  each  UNION  operation  in  sequence  S'  is  converted  into  three  opera¬ 
tions  in  S,  we  have  m'  <  m  <  3 in'.  Since  m  =  0(m'),  an  0(m  a(n ))  time  bound 
for  the  converted  sequence  S  implies  an  0(nf  a(n))  time  bound  for  the  original 
sequence  S'.  m 

In  the  remainder  of  this  section,  we  shall  assume  that  the  initial  sequence  of  m' 
Make-Set,  Union,  and  Find-Set  operations  has  been  converted  to  a  sequence 
of  m  Make-Set,  Link,  and  Find-Set  operations.  We  now  prove  an  ()(m  a(n)) 
time  bound  for  the  converted  sequence  and  appeal  to  Lemma  21.7  to  prove  the 
0(m '  a(n))  running  time  of  the  original  sequence  of  m'  operations. 

Potential  function 

The  potential  function  we  use  assigns  a  potential  fq{x)  to  each  node  x  in  the 
disjoint-set  forest  after  q  operations.  We  sum  the  node  potentials  for  the  poten¬ 
tial  of  the  entire  forest:  A>q  =  <pq(x),  where  <3>9  denotes  the  potential  of  the 

forest  after  q  operations.  The  forest  is  empty  prior  to  the  first  operation,  and  we 
arbitrarily  set  <f>0  =  0.  No  potential  d>9  will  ever  be  negative. 

The  value  of  (j>q(x)  depends  on  whether  x  is  a  tree  root  after  the  c/th  operation. 
If  it  is,  or  if  x.rank  —  0,  then  fq{x)  =  a(n)  ■  x.rank. 

Now  suppose  that  after  the  c/th  operation,  x  is  not  a  root  and  that  x.rank  >  1. 
We  need  to  define  two  auxiliary  functions  on  x  before  we  can  define  4>q(x).  First 
we  define 

level(x)  =  max{/r  :  x.p.rank  >  Aifx.rank)}  . 

That  is,  level(x)  is  the  greatest  level  k  for  which  A /c ,  applied  to  x’s  rank,  is  no 
greater  than  x’s  parent’s  rank. 

We  claim  that 

0  <  level(x)  <  a(n)  ,  (21.1) 

which  we  see  as  follows.  We  have 

x.p.rank  >  x.rank  +  1  (by  Lemma  21.4) 

=  A0(x.rank)  (by  definition  of  A0(  j ))  , 

which  implies  that  level(x)  >  0,  and  we  have 
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Act(n)  (x  •  rank')  f  Aa^n^(l) 

>  n 

>  x.p.rank 


(because  Ak(j)  is  strictly  increasing) 
(by  the  definition  of  a  (n )) 

(by  Lemma  21.6)  , 


which  implies  that  level (x)  <  a(n).  Note  that  because  x.p.rank  monotonically 
increases  over  time,  so  does  level(x). 

The  second  auxiliary  function  applies  when  x . rank  >  1 : 


iter(x)  =  max  \i  :  x.p.rank  >  a\1^c],  .(x .rank)}  . 

That  is,  iter(x)  is  the  largest  number  of  times  we  can  iteratively  apply  .4ievei(x), 
applied  initially  to  x’s  rank,  before  we  get  a  value  greater  than  x’s  parent’s  rank. 
We  claim  that  when  x .  rank  >  1 ,  we  have 


1  <  iter(x)  <  x.rank  ,  (21.2) 

which  we  see  as  follows.  We  have 


x.p.rank  >  ^4ievei(x) {x.rank)  (by  definition  of  level(x)) 

=  A {x.rank)  (by  definition  of  functional  iteration)  , 

which  implies  that  iter(x)  >  1 ,  and  we  have 

Ai*ve\W+1)(x-rank)  =  AleveKx)+l(x.rank )  (by  definition  of  Ak (j)) 

>  x.p.rank  (by  definition  of  level (x))  , 

which  implies  that  iter(x)  <  x.rank.  Note  that  because  x.p.rank  monotonically 
increases  over  time,  in  order  for  iter(x)  to  decrease,  level(x)  must  increase.  As  long 
as  level(x)  remains  unchanged,  iter(x)  must  either  increase  or  remain  unchanged. 

With  these  auxiliary  functions  in  place,  we  are  ready  to  define  the  potential  of 
node  x  after  q  operations: 

a(n)  ■  x.rank  if  x  is  a  root  or  x.rank  =  0  , 

( a(n )  —  level(x))- x.rank  —  iter(x)  if  x  is  not  a  root  and  x.rank  >  1  . 

We  next  investigate  some  useful  properties  of  node  potentials. 


Lemma  21.8 

For  every  node  x,  and  for  all  operation  counts  q,  we  have 
0  <  (pq{x)  <  a(n)  ■  x.rank  . 
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Proof  If  x  is  a  root  or  x .  rank  =  0,  then  (})q(x)  =  a(n)-x.  rank  by  definition.  Now 
suppose  that  x  is  not  a  root  and  that  x .  rank  >  1 .  We  obtain  a  lower  bound  on  <f)q  (x) 
by  maximizing  level(x)  and  iter(x).  By  the  bound  (21.1),  level(x)  <  a(n)—  1,  and 
by  the  bound  (21.2),  iter(x)  <  x.rank.  Thus, 

cj)q (x)  =  (a (n)  —  level(x))  •  x.rank  —  iter(x) 

>  (a(n)  —  (a (n)  —  1))  ■  x.rank  —  x.rank 

=  x .  rank  —  x .  rank 
=  0 . 

Similarly,  we  obtain  an  upper  bound  on  fq(x )  by  minimizing  level (x)  and  iter(x). 
By  the  bound  (21.1),  level(x)  >  0,  and  by  the  bound  (21.2),  iter(x)  >  1.  Thus, 

4>q(x)  <  (a(«)  —  0)  •  x.rank  —  1 

=  a(n) ■  x.rank  —  1 

<  a(n)  ■  x.rank.  m 

Corollary  21.9 

If  node  x  is  not  a  root  and  x.rank  >  0,  then  cj)q{x)  <  a(n)  ■  x.rank.  m 


Potential  changes  and  amortized  costs  of  operations 

We  are  now  ready  to  examine  how  the  disjoint-set  operations  affect  node  potentials. 
With  an  understanding  of  the  change  in  potential  due  to  each  operation,  we  can 
determine  each  operation’s  amortized  cost. 

Lemma  21.10 

Let  x  be  a  node  that  is  not  a  root,  and  suppose  that  the  q\h  operation  is  either  a 
Link  or  Find-Set.  Then  after  the  qih  operation,  (j)q (x )  <  <pq~\  (x).  Moreover,  if 
x.rank  >  1  and  either  level(x)  or  iter(x)  changes  due  to  the  c/th  operation,  then 
tj)q (x)  <  cpq-\  (x)  —  1.  That  is,  x’s  potential  cannot  increase,  and  if  it  has  positive 
rank  and  either  level(x)  or  iter(x)  changes,  then  x’s  potential  drops  by  at  least  1. 

Proof  Because  x  is  not  a  root,  the  c/th  operation  does  not  change  x.rank ,  and 
because  n  does  not  change  after  the  initial  n  Make-Set  operations,  a(n)  remains 
unchanged  as  well.  Hence,  these  components  of  the  formula  for  x’s  potential  re¬ 
main  the  same  after  the  c/th  operation.  If  x.rank  =  0,  then  (pq (x )  =  (j>q- 1  (x )  =  0. 
Now  assume  that  x.rank  >  1. 

Recall  that  level(x)  monotonically  increases  over  time.  If  the  c/th  operation 
leaves  level(x)  unchanged,  then  iter(x)  either  increases  or  remains  unchanged. 
If  both  level(x)  and  iter(x)  are  unchanged,  then  fq(x)  =  c/>?_i  (x ) .  If  level(x) 
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is  unchanged  and  iter(x)  increases,  then  it  increases  by  at  least  1,  and  so 

<t>q(x)  <  (pq- i(x)  -  1. 

Finally,  if  the  c/th  operation  increases  level(x),  it  increases  by  at  least  1,  so  that 
the  value  of  the  term  (ot(n)  —  level(x))  ■  x.rank  drops  by  at  least  x.rank.  Be¬ 
cause  level(x)  increased,  the  value  of  iter(x)  might  drop,  but  according  to  the 
bound  (21.2),  the  drop  is  by  at  most  x.rank  —  1.  Thus,  the  increase  in  poten¬ 
tial  due  to  the  change  in  iter(x)  is  less  than  the  decrease  in  potential  due  to  the 
change  in  level (x),  and  we  conclude  that  (pq(x)  <  cpq-\(x)  —  1.  ■ 

Our  final  three  lemmas  show  that  the  amortized  cost  of  each  Make-Set,  Link, 
and  Find-Set  operation  is  0(a(n)).  Recall  from  equation  (17.2)  that  the  amor¬ 
tized  cost  of  each  operation  is  its  actual  cost  plus  the  increase  in  potential  due  to 
the  operation. 

Lemma  21.11 

The  amortized  cost  of  each  Make-Set  operation  is  0(1). 

Proof  Suppose  that  the  c/th  operation  is  Make-Set(x).  This  operation  creates 
node  x  with  rank  0,  so  that  <pq(x)  =  0.  No  other  ranks  or  potentials  change,  and 
so  <£>?  =  <F9_i.  Noting  that  the  actual  cost  of  the  Make-Set  operation  is  (9(1) 
completes  the  proof.  ■ 

Lemma  21.12 

The  amortized  cost  of  each  Link  operation  is  0(a(n)). 

Proof  Suppose  that  the  c/th  operation  is  Link(x,  y).  The  actual  cost  of  the  Link 
operation  is  (9(1).  Without  loss  of  generality,  suppose  that  the  Link  makes  y  the 
parent  of  x. 

To  determine  the  change  in  potential  due  to  the  Link,  we  note  that  the  only 
nodes  whose  potentials  may  change  are  x,  y,  and  the  children  of  y  just  prior  to  the 
operation.  We  shall  show  that  the  only  node  whose  potential  can  increase  due  to 
the  Link  is  y,  and  that  its  increase  is  at  most  a(n): 

•  By  Lemma  21.10,  any  node  that  is  y’s  child  just  before  the  Link  cannot  have 
its  potential  increase  due  to  the  Link. 

•  From  the  definition  of  (pq  (x ) ,  we  see  that,  since  x  was  a  root  just  before  the  c/th 
operation,  <pq_i(x)  =  a(n)-x.rank.  If  x.rank  =  0,  then  (pq(x)  =  cpq^1(x)  =  0. 
Otherwise, 

cpq(x)  <  a (n )  •  x . rank  (by  Corollary  21.9) 

=  4>g-i(x), 

and  so  x’s  potential  decreases. 
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•  Because  y  is  a  root  prior  to  the  Link,  cpq^i(y)  =  a(n)  ■  y .rank.  The  Link 
operation  leaves  y  as  a  root,  and  it  either  leaves  y ’s  rank  alone  or  it  increases  y ’s 
rank  by  1.  Therefore,  either  <pq(y)  =  fq-\  (y)  or  4>q(y)  =  (pq-\  (>')  +  a(n). 

The  increase  in  potential  due  to  the  Link  operation,  therefore,  is  at  most  a(n). 
The  amortized  cost  of  the  Link  operation  is  0(1)  +  a(n)  =  0(a(n)).  m 

Lemma  21.13 

The  amortized  cost  of  each  Find-Set  operation  is  0(a(n)). 


Proof  Suppose  that  the  q\h  operation  is  a  Find-Set  and  that  the  find  path  con¬ 
tains  s  nodes.  The  actual  cost  of  the  Find-Set  operation  is  0(s).  We  shall 
show  that  no  node’s  potential  increases  due  to  the  Find-Set  and  that  at  least 
max(0,  s  —  (a(n)  +  2))  nodes  on  the  find  path  have  their  potential  decrease  by 
at  least  1 . 

To  see  that  no  node’s  potential  increases,  we  first  appeal  to  Lemma  21.10  for  all 
nodes  other  than  the  root.  If  x  is  the  root,  then  its  potential  is  a(n)  ■  x.rank,  which 
does  not  change. 

Now  we  show  that  at  least  max(0,  s  —  ( a(n )  +  2))  nodes  have  their  potential 
decrease  by  at  least  1.  Let  x  be  a  node  on  the  find  path  such  that  x.rank  >  0 
and  x  is  followed  somewhere  on  the  find  path  by  another  node  y  that  is  not  a  root, 
where  level (y)  =  level  (x)  just  before  the  Find-Set  operation.  (Node  y  need  not 
immediately  follow  x  on  the  find  path.)  All  but  at  most  a(n)  +  2  nodes  on  the  find 
path  satisfy  these  constraints  on  x.  Those  that  do  not  satisfy  them  are  the  first  node 
on  the  find  path  (if  it  has  rank  0),  the  last  node  on  the  path  (i.e.,  the  root),  and  the 
last  node  w  on  the  path  for  which  level (w)  =  k,  for  each  k  =  0, 1, 2, . . . ,  a(n)—  1. 

Let  us  fix  such  a  node  x,  and  we  shall  show  that  x’s  potential  decreases  by  at 
least  1.  Let  k  =  level  (x)  =  level(y).  Just  prior  to  the  path  compression  caused  by 
the  Find-Set,  we  have 

x. p.rank  >  A(]'(Wr<'x))  (x  .rank)  (by  definition  of  iter(x))  , 

y. p.rank  >  A ^(y .rank)  (by  definition  of  level (y))  , 

y.rank  >  x.p.rank  (by  Corollary  21.5  and  because 

y  follows  x  on  the  find  path)  . 


Putting  these  inequalities  together  and  letting  i  be  the  value  of  iter(x)  before  path 
compression,  we  have 


y.p.rank 


>  Ak  (y.rank) 

>  Ak(x.p.rank) 


(because  A  (j  )  is  strictly  increasing) 


>  A  k(A{^r{x))  (x .  rank)) 


=  A 


0  +  D 


(x.rank)  . 


21.4  Analysis  of  union  by  rank  with  path  compression 
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Because  path  compression  will  make  x  and  y  have  the  same  parent,  we  know 
that  after  path  compression,  x.p.rank  =  y.p.rank  and  that  the  path  compression 
does  not  decrease  y.p.rank.  Since  x.rank  does  not  change,  after  path  compression 
we  have  that  x.p.rank  >  A^  +  i>  (x.rank).  Thus,  path  compression  will  cause  ei¬ 
ther  iter(x)  to  increase  (to  at  least  i  +  1)  or  level  (x)  to  increase  (which  occurs  if 
iter(x)  increases  to  at  least  x.rank  +  1).  In  either  case,  by  Lemma  21.10,  we  have 
(pq(x)  <  (l)q- 1  (x )  —  1.  Hence,  x’s  potential  decreases  by  at  least  1. 

The  amortized  cost  of  the  Find-Set  operation  is  the  actual  cost  plus  the  change 
in  potential.  The  actual  cost  is  O(s),  and  we  have  shown  that  the  total  potential 
decreases  by  at  least  max(0,5  —  (a (n)  +  2)).  The  amortized  cost,  therefore,  is  at 
most  0(s )  —  (s  —  (a(n)  +  2))  =  0(s)  —  s  +  0(a(n))  =  0(a(n)),  since  we  can 
scale  up  the  units  of  potential  to  dominate  the  constant  hidden  in  O(s).  m 

Putting  the  preceding  lemmas  together  yields  the  following  theorem. 

Theorem  21.14 

A  sequence  of  m  Make-Set,  Union,  and  Find-Set  operations,  n  of  which  are 
Make-Set  operations,  can  be  performed  on  a  disjoint-set  forest  with  union  by 
rank  and  path  compression  in  worst-case  time  0(m  a(n)). 

Proof  Immediate  from  Lemmas  21.7,  21.1 1,  21.12,  and  21.13.  ■ 

Exercises 


21.4-1 

Prove  Lemma  21.4. 


21.4- 2 

Prove  that  every  node  has  rank  at  most  |_lg  n\ . 

21.4- 3 

In  light  of  Exercise  21.4-2,  how  many  bits  are  necessary  to  store  x.rank  for  each 
node  x? 


21.4-4 

Using  Exercise  21.4-2,  give  a  simple  proof  that  operations  on  a  disjoint-set  forest 
with  union  by  rank  but  without  path  compression  run  in  0(m  lg  n)  time. 


21.4-5 

Professor  Dante  reasons  that  because  node  ranks  increase  strictly  along  a  simple 
path  to  the  root,  node  levels  must  monotonically  increase  along  the  path.  In  other 
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words,  if  x.rank  >  0  and  x.p  is  not  a  root,  then  level  (x)  <  level(x.p).  Is  the 
professor  correct? 

21.4-6  * 

Consider  the  function  a'(ri)  =  min  \k  :  Ak(\)  >  lg(/;  +  1)}.  Show  that  a'(n )  <  3 
for  all  practical  values  of  n  and,  using  Exercise  21.4-2,  show  how  to  modify  the 
potential-function  argument  to  prove  that  we  can  perform  a  sequence  of  m  Make- 
Set,  Union,  and  Find-Set  operations,  n  of  which  are  Make-Set  operations,  on 
a  disjoint-set  forest  with  union  by  rank  and  path  compression  in  worst-case  time 
0(m  a'(n)). 


Problems 


21  -1  Off-line  minimum 

The  off-line  minimum  problem  asks  us  to  maintain  a  dynamic  set  T  of  elements 
from  the  domain  {1,2 under  the  operations  INSERT  and  Extract-Min. 
We  are  given  a  sequence  S  of  n  INSERT  and  m  Extract-Min  calls,  where  each 
key  in  {1,2 ,...,«}  is  inserted  exactly  once.  We  wish  to  determine  which  key 
is  returned  by  each  Extract-Min  call.  Specifically,  we  wish  to  till  in  an  array 
extracted [  1  ..»?],  where  for  i  =  1,2 , ,m,  extracted [i ]  is  the  key  returned  by 
the  zth  Extract-Min  call.  The  problem  is  “off-line”  in  the  sense  that  we  are 
allowed  to  process  the  entire  sequence  S  before  determining  any  of  the  returned 
keys. 

a.  In  the  following  instance  of  the  off-line  minimum  problem,  each  operation 
Insert (i)  is  represented  by  the  value  of  i  and  each  Extract-Min  is  rep¬ 
resented  by  the  letter  E: 

4,  8,  E,  3,  E,  9,  2,  6,  E,  E,  E,  1,7,E,5  . 

Fill  in  the  correct  values  in  the  extracted  array. 

To  develop  an  algorithm  for  this  problem,  we  break  the  sequence  S  into  homoge¬ 
neous  subsequences.  That  is,  we  represent  S  by 

Ii  -  E,  U,  E,  I3, . . . ,  Im,  E,  Im+i  , 

where  each  E  represents  a  single  Extract-Min  call  and  each  1,  represents  a  (pos¬ 
sibly  empty)  sequence  of  INSERT  calls.  For  each  subsequence  I/,  we  initially  place 
the  keys  inserted  by  these  operations  into  a  set  Kj ,  which  is  empty  if  I,  is  empty. 
We  then  do  the  following: 


Problems  for  Chapter  21 
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Off-Line-Minimum  (m ,  n) 

1  for  i  =  1  to  n 

2  determine  j  such  that  i  €  Kj 

3  if  j  7^  m  +  1 

4  extracted  [/]  =  i 

5  let  /  be  the  smallest  value  greater  than  j 

for  which  set  Ki  exists 

6  K[  =  A'/  U  A'; ,  destroying  K j 

7  return  extracted 

b.  Argue  that  the  array  extracted  returned  by  Off-Line-Minimum  is  correct. 

c.  Describe  how  to  implement  Off-Line-Minimum  efficiently  with  a  disjoint- 
set  data  structure.  Give  a  tight  bound  on  the  worst-case  running  time  of  your 
implementation. 

21-2  Depth  determination 

In  the  depth-determination  problem,  we  maintain  a  forest  3~  =  (7)}  of  rooted 
trees  under  three  operations: 

Make-Tree(v)  creates  a  tree  whose  only  node  is  v. 

Find-Depth  (v)  returns  the  depth  of  node  v  within  its  tree. 

Graft (r,  v)  makes  node  r,  which  is  assumed  to  be  the  root  of  a  tree,  become  the 
child  of  node  v,  which  is  assumed  to  be  in  a  different  tree  than  r  but  may  or  may 
not  itself  be  a  root. 

a.  Suppose  that  we  use  a  tree  representation  similar  to  a  disjoint-set  forest:  v.p 
is  the  parent  of  node  v,  except  that  v.p  =  v  if  v  is  a  root.  Suppose  further 
that  we  implement  Graft(t,  v)  by  setting  r.p  —  v  and  Find-Depth  (v)  by 
following  the  find  path  up  to  the  root,  returning  a  count  of  all  nodes  other  than  v 
encountered.  Show  that  the  worst-case  running  time  of  a  sequence  of  m  Make- 
Tree,  Find-Depth,  and  Graft  operations  is  &(m2). 

By  using  the  union-by-rank  and  path-compression  heuristics,  we  can  reduce  the 
worst-case  running  time.  We  use  the  disjoint-set  forest  -8  =  {Si},  where  each 
set  Sj  (which  is  itself  a  tree)  corresponds  to  a  tree  7}  in  the  forest  3~ .  The  tree 
structure  within  a  set  Sj- ,  however,  does  not  necessarily  correspond  to  that  of  7) .  In 
fact,  the  implementation  of  S,-  does  not  record  the  exact  parent-child  relationships 
but  nevertheless  allows  us  to  determine  any  node’s  depth  in  7). 

The  key  idea  is  to  maintain  in  each  node  v  a  “pseudodistance”  v.d,  which  is 
defined  so  that  the  sum  of  the  pseudodistances  along  the  simple  path  from  v  to  the 
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root  of  its  set  S,  equals  the  depth  of  v  in  7} .  That  is,  if  the  simple  path  from  v  to  its 
root  in  5,  is  v0>  Vi, . . . ,  v*,  where  v0  =  v  and  Vk  is  St ’s  root,  then  the  depth  of  v 
in  Tj  is  J2j= 0  vj.d. 

b.  Give  an  implementation  of  Make-Tree. 

c.  Show  how  to  modify  Find-Set  to  implement  Find-Depth.  Your  implemen¬ 
tation  should  perform  path  compression,  and  its  running  time  should  be  linear 
in  the  length  of  the  find  path.  Make  sure  that  your  implementation  updates 
pseudodistances  correctly. 

d.  Show  how  to  implement  Graft (r,  v),  which  combines  the  sets  containing  r 
and  v,  by  modifying  the  Union  and  Link  procedures.  Make  sure  that  your 
implementation  updates  pseudodistances  correctly.  Note  that  the  root  of  a  set  5, 
is  not  necessarily  the  root  of  the  corresponding  tree  7} . 

e.  Give  a  tight  bound  on  the  worst-case  running  time  of  a  sequence  of  m  Make- 
Tree,  Find-Depth,  and  Graft  operations,  n  of  which  are  Make-Tree  op¬ 
erations. 

21-3  Tarjan’s  off-line  least-common-ancestors  algorithm 
The  least  common  ancestor  of  two  nodes  u  and  v  in  a  rooted  tree  T  is  the  node  w 
that  is  an  ancestor  of  both  u  and  v  and  that  has  the  greatest  depth  in  T.  In  the 
off-line  least-common-ancestors  problem,  we  are  given  a  rooted  tree  T  and  an 
arbitrary  set  P  =  { j  u ,  v  j  j  of  unordered  pairs  of  nodes  in  T,  and  we  wish  to  deter¬ 
mine  the  least  common  ancestor  of  each  pair  in  P . 

To  solve  the  off-line  least-common-ancestors  problem,  the  following  procedure 
performs  a  tree  walk  of  T  with  the  initial  call  LCA(7".  root).  We  assume  that  each 
node  is  colored  WHITE  prior  to  the  walk. 

LCA(w) 

1  Make-Set(w) 

2  Find-Set  (n).  ancestor  =  u 

3  for  each  child  v  of  u  in  T 

4  LCA(v) 

5  Union(w,u) 

6  Find-Set  (u). ancestor  =  u 

7  u.  color  =  BLACK 

8  for  each  node  v  such  that  {u,  v}  €  P 

9  if  v. color  ==  BLACK 

10  print  “The  least  common  ancestor  of” 

u  “and”  v  “is”  Find-Set(v). ancestor 


Notes  for  Chapter  21 
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a.  Argue  that  line  10  executes  exactly  once  for  each  pair  {u,v}  e  P . 

b.  Argue  that  at  the  time  of  the  call  LCA(u),  the  number  of  sets  in  the  disjoint-set 
data  structure  equals  the  depth  of  u  in  T . 

c.  Prove  that  LCA  correctly  prints  the  least  common  ancestor  of  u  and  v  for  each 
pair  {u,  v}  €  P . 

d.  Analyze  the  running  time  of  LCA,  assuming  that  we  use  the  implementation  of 
the  disjoint-set  data  structure  in  Section  21.3. 


Chapter  notes 

Many  of  the  important  results  for  disjoint-set  data  structures  are  due  at  least  in  part 
to  R.  E.  Tarjan.  Using  aggregate  analysis,  Tarjan  [328,  330]  gave  the  first  tight 
upper  bound  in  terms  of  the  very  slowly  growing  inverse  a(m,n )  of  Ackermann’s 
function.  (The  function  Ak(j)  given  in  Section  21.4  is  similar  to  Ackermann’s 
function,  and  the  function  a(n)  is  similar  to  the  inverse.  Both  a(n)  and  a(m,n) 
are  at  most  4  for  all  conceivable  values  of  m  and  n.)  An  ()(m  lg*  n)  upper  bound 
was  proven  earlier  by  Hopcroft  and  Ullman  [5,  179].  The  treatment  in  Section  21.4 
is  adapted  from  a  later  analysis  by  Tarjan  [332],  which  is  in  turn  based  on  an  anal¬ 
ysis  by  Kozen  [220].  Harfst  and  Reingold  [161]  give  a  potential-based  version  of 
Tarjan’s  earlier  bound. 

Tarjan  and  van  Leeuwen  [333]  discuss  variants  on  the  path-compression  heuris¬ 
tic,  including  “one-pass  methods,”  which  sometimes  offer  better  constant  factors 
in  their  performance  than  do  two-pass  methods.  As  with  Tarjan’s  earlier  analyses 
of  the  basic  path-compression  heuristic,  the  analyses  by  Tarjan  and  van  Leeuwen 
are  aggregate.  Harfst  and  Reingold  [161]  later  showed  how  to  make  a  small  change 
to  the  potential  function  to  adapt  their  path-compression  analysis  to  these  one-pass 
variants.  Gabow  and  Tarjan  [121]  show  that  in  certain  applications,  the  disjoint-set 
operations  can  be  made  to  run  in  0(m)  time. 

Tarjan  [329]  showed  that  a  lower  bound  of  Q(m  aim ,  n))  time  is  required  for 
operations  on  any  disjoint-set  data  structure  satisfying  certain  technical  conditions. 
This  lower  bound  was  later  generalized  by  Fredman  and  Saks  [113],  who  showed 
that  in  the  worst  case,  Q(ma(m,n))  (lg  »)-bit  words  of  memory  must  be  accessed. 


VI  Graph  Algorithms 


Introduction 


Graph  problems  pervade  computer  science,  and  algorithms  for  working  with  them 
are  fundamental  to  the  field.  Hundreds  of  interesting  computational  problems  are 
couched  in  terms  of  graphs.  In  this  part,  we  touch  on  a  few  of  the  more  significant 
ones. 

Chapter  22  shows  how  we  can  represent  a  graph  in  a  computer  and  then  discusses 
algorithms  based  on  searching  a  graph  using  either  breadth-first  search  or  depth- 
first  search.  The  chapter  gives  two  applications  of  depth-first  search:  topologically 
sorting  a  directed  acyclic  graph  and  decomposing  a  directed  graph  into  its  strongly 
connected  components. 

Chapter  23  describes  how  to  compute  a  minimum-weight  spanning  tree  of  a 
graph:  the  least-weight  way  of  connecting  all  of  the  vertices  together  when  each 
edge  has  an  associated  weight.  The  algorithms  for  computing  minimum  spanning 
trees  serve  as  good  examples  of  greedy  algorithms  (see  Chapter  16). 

Chapters  24  and  25  consider  how  to  compute  shortest  paths  between  vertices 
when  each  edge  has  an  associated  length  or  “weight.”  Chapter  24  shows  how  to 
find  shortest  paths  from  a  given  source  vertex  to  all  other  vertices,  and  Chapter  25 
examines  methods  to  compute  shortest  paths  between  every  pair  of  vertices. 

Finally,  Chapter  26  shows  how  to  compute  a  maximum  flow  of  material  in  a  flow 
network,  which  is  a  directed  graph  having  a  specified  source  vertex  of  material,  a 
specified  sink  vertex,  and  specified  capacities  for  the  amount  of  material  that  can 
traverse  each  directed  edge.  This  general  problem  arises  in  many  forms,  and  a 
good  algorithm  for  computing  maximum  flows  can  help  solve  a  variety  of  related 
problems  efficiently. 
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When  we  characterize  the  running  time  of  a  graph  algorithm  on  a  given  graph 
G  =  (V,  E),  we  usually  measure  the  size  of  the  input  in  terms  of  the  number  of 
vertices  \  V\  and  the  number  of  edges  \E\  of  the  graph.  That  is,  we  describe  the 
size  of  the  input  with  two  parameters,  not  just  one.  We  adopt  a  common  notational 
convention  for  these  parameters.  Inside  asymptotic  notation  (such  as  O-notation 
or  ©-notation),  and  only  inside  such  notation,  the  symbol  V  denotes  \V\  and 
the  symbol  E  denotes  \E\.  For  example,  we  might  say,  “the  algorithm  runs  in 
time  0(VE)”  meaning  that  the  algorithm  runs  in  time  O ( | V \  \E\).  This  conven¬ 
tion  makes  the  running-time  formulas  easier  to  read,  without  risk  of  ambiguity. 

Another  convention  we  adopt  appears  in  pseudocode.  We  denote  the  vertex  set 
of  a  graph  G  by  G.  V  and  its  edge  set  by  G.E.  That  is,  the  pseudocode  views  vertex 
and  edge  sets  as  attributes  of  a  graph. 
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Elementary  Graph  Algorithms 


This  chapter  presents  methods  for  representing  a  graph  and  for  searching  a  graph. 
Searching  a  graph  means  systematically  following  the  edges  of  the  graph  so  as  to 
visit  the  vertices  of  the  graph.  A  graph-searching  algorithm  can  discover  much 
about  the  structure  of  a  graph.  Many  algorithms  begin  by  searching  their  input 
graph  to  obtain  this  structural  information.  Several  other  graph  algorithms  elabo¬ 
rate  on  basic  graph  searching.  Techniques  for  searching  a  graph  lie  at  the  heart  of 
the  field  of  graph  algorithms. 

Section  22.1  discusses  the  two  most  common  computational  representations  of 
graphs:  as  adjacency  lists  and  as  adjacency  matrices.  Section  22.2  presents  a  sim¬ 
ple  graph-searching  algorithm  called  breadth-first  search  and  shows  how  to  cre¬ 
ate  a  breadth-first  tree.  Section  22.3  presents  depth-first  search  and  proves  some 
standard  results  about  the  order  in  which  depth-first  search  visits  vertices.  Sec¬ 
tion  22.4  provides  our  first  real  application  of  depth-first  search:  topologically  sort¬ 
ing  a  directed  acyclic  graph.  A  second  application  of  depth-first  search,  finding  the 
strongly  connected  components  of  a  directed  graph,  is  the  topic  of  Section  22.5. 


22.1  Representations  of  graphs 

We  can  choose  between  two  standard  ways  to  represent  a  graph  G  =  (V,E): 
as  a  collection  of  adjacency  lists  or  as  an  adjacency  matrix.  Either  way  applies 
to  both  directed  and  undirected  graphs.  Because  the  adjacency-list  representation 
provides  a  compact  way  to  represent  sparse  graphs— those  for  which  |£j  is  much 
less  than  |E|2— it  is  usually  the  method  of  choice.  Most  of  the  graph  algorithms 
presented  in  this  book  assume  that  an  input  graph  is  represented  in  adjacency- 
list  form.  We  may  prefer  an  adjacency-matrix  representation,  however,  when  the 
graph  is  dense  — \  E\  is  close  to  |E|2— or  when  we  need  to  be  able  to  tell  quickly 
if  there  is  an  edge  connecting  two  given  vertices.  For  example,  two  of  the  all-pairs 
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Figure  22.1  1\vo  representations  of  an  undirected  graph,  (a)  An  undirected  graph  G  with  5  vertices 
and  7  edges,  (b)  An  adjacency  list  representation  of  G.  (c)  The  adjacency  matrix  representation 
of  G. 
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Figure  22.2  Two  representations  of  a  directed  graph,  (a)  A  directed  graph  G  with  6  vertices  and  8 
edges,  (b)  An  adjacency  list  representation  of  G.  (c)  The  adjacency  matrix  representation  of  G. 


shortest-paths  algorithms  presented  in  Chapter  25  assume  that  their  input  graphs 
are  represented  by  adjacency  matrices. 

The  adjacency -list  representation  of  a  graph  G  =  ( V,E )  consists  of  an  ar¬ 
ray  Adj  of  |K|  lists,  one  for  each  vertex  in  V.  For  each  u  €  V,  the  adjacency  list 
Adj[u]  contains  all  the  vertices  v  such  that  there  is  an  edge  (u,v)  €  E.  That  is, 
Adj[u]  consists  of  all  the  vertices  adjacent  to  u  in  G.  (Alternatively,  it  may  contain 
pointers  to  these  vertices.)  Since  the  adjacency  lists  represent  the  edges  of  a  graph, 
in  pseudocode  we  treat  the  array  Adj  as  an  attribute  of  the  graph,  just  as  we  treat 
the  edge  set  E.  In  pseudocode,  therefore,  we  will  see  notation  such  as  G.Adj[u]. 
Figure  22.1(b)  is  an  adjacency-list  representation  of  the  undirected  graph  in  Fig¬ 
ure  22.1(a).  Similarly,  Figure  22.2(b)  is  an  adjacency-list  representation  of  the 
directed  graph  in  Figure  22.2(a). 

If  G  is  a  directed  graph,  the  sum  of  the  lengths  of  all  the  adjacency  lists  is  |£j, 
since  an  edge  of  the  form  (w,  v)  is  represented  by  having  v  appear  in  Adj[u}.  If  G  is 
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an  undirected  graph,  the  sum  of  the  lengths  of  all  the  adjacency  lists  is  2  \  E\,  since 
if  (u,  v)  is  an  undirected  edge,  then  u  appears  in  v’s  adjacency  list  and  vice  versa. 
For  both  directed  and  undirected  graphs,  the  adjacency-list  representation  has  the 
desirable  property  that  the  amount  of  memory  it  requires  is  @(F  +  E). 

We  can  readily  adapt  adjacency  lists  to  represent  weighted  graphs,  that  is,  graphs 
for  which  each  edge  has  an  associated  weight,  typically  given  by  a  weight  function 
w  :  E  — R.  For  example,  let  G  =  (V,  E )  be  a  weighted  graph  with  weight 
function  w.  We  simply  store  the  weight  w(u,v)  of  the  edge  (u.  v)  e  E  with 
vertex  v  in  u’s  adjacency  list.  The  adjacency-list  representation  is  quite  robust  in 
that  we  can  modify  it  to  support  many  other  graph  valiants. 

A  potential  disadvantage  of  the  adjacency-list  representation  is  that  it  provides 
no  quicker  way  to  determine  whether  a  given  edge  ( u ,  v)  is  present  in  the  graph 
than  to  search  for  v  in  the  adjacency  list  Adj[u\.  An  adjacency-matrix  representa¬ 
tion  of  the  graph  remedies  this  disadvantage,  but  at  the  cost  of  using  asymptotically 
more  memory.  (See  Exercise  22.1-8  for  suggestions  of  variations  on  adjacency  lists 
that  permit  faster  edge  lookup.) 

For  the  adjacency-matrix  representation  of  a  graph  G  =  (V,  E),  we  assume 
that  the  vertices  are  numbered  1,2, ....  \  V\  in  some  arbitrary  manner.  Then  the 
adjacency-matrix  representation  of  a  graph  G  consists  of  a  \V\  x  \V\  matrix 
A  =  ( Oij )  such  that 

1  if  (i,j)  e  E  , 

0  otherwise  . 

Figures  22.1(c)  and  22.2(c)  are  the  adjacency  matrices  of  the  undirected  and  di¬ 
rected  graphs  in  Figures  22.1(a)  and  22.2(a),  respectively.  The  adjacency  matrix  of 
a  graph  requires  0(F2)  memory,  independent  of  the  number  of  edges  in  the  graph. 

Observe  the  symmetry  along  the  main  diagonal  of  the  adjacency  matrix  in  Fig¬ 
ure  22.1(c).  Since  in  an  undirected  graph,  (u,v)  and  (v,u)  represent  the  same 
edge,  the  adjacency  matrix  A  of  an  undirected  graph  is  its  own  transpose:  A  =  AT. 
In  some  applications,  it  pays  to  store  only  the  entries  on  and  above  the  diagonal  of 
the  adjacency  matrix,  thereby  cutting  the  memory  needed  to  store  the  graph  almost 
in  half. 

Like  the  adjacency-list  representation  of  a  graph,  an  adjacency  matrix  can  repre¬ 
sent  a  weighted  graph.  For  example,  if  G  =  (V.  E)  is  a  weighted  graph  with  edge- 
weight  function  w,  we  can  simply  store  the  weight  w(u,v)  of  the  edge  (u,v)  e  E 
as  the  entry  in  row  u  and  column  v  of  the  adjacency  matrix.  If  an  edge  does  not 
exist,  we  can  store  a  NIL  value  as  its  corresponding  matrix  entry,  though  for  many 
problems  it  is  convenient  to  use  a  value  such  as  0  or  oo. 

Although  the  adjacency-list  representation  is  asymptotically  at  least  as  space- 
efficient  as  the  adjacency-matrix  representation,  adjacency  matrices  are  simpler, 
and  so  we  may  prefer  them  when  graphs  are  reasonably  small.  Moreover,  adja- 
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cency  matrices  carry  a  further  advantage  for  unweighted  graphs:  they  require  only 
one  bit  per  entry. 

Representing  attributes 

Most  algorithms  that  operate  on  graphs  need  to  maintain  attributes  for  vertices 
and/or  edges.  We  indicate  these  attributes  using  our  usual  notation,  such  as  v.d 
for  an  attribute  d  of  a  vertex  v.  When  we  indicate  edges  as  pairs  of  vertices,  we 
use  the  same  style  of  notation.  For  example,  if  edges  have  an  attribute  /,  then  we 
denote  this  attribute  for  edge  (m,  v)  by  (w,  v).f.  For  the  purpose  of  presenting  and 
understanding  algorithms,  our  attribute  notation  suffices. 

Implementing  vertex  and  edge  attributes  in  real  programs  can  be  another  story 
entirely.  There  is  no  one  best  way  to  store  and  access  vertex  and  edge  attributes. 
For  a  given  situation,  your  decision  will  likely  depend  on  the  programming  lan¬ 
guage  you  are  using,  the  algorithm  you  are  implementing,  and  how  the  rest  of  your 
program  uses  the  graph.  If  you  represent  a  graph  using  adjacency  lists,  one  design 
represents  vertex  attributes  in  additional  arrays,  such  as  an  array  d[  1  . .  |  F|]  that 
parallels  the  Adj  array.  If  the  vertices  adjacent  to  u  are  in  Adj[n],  then  what  we  call 
the  attribute  u.d  would  actually  be  stored  in  the  array  entry  d[u\.  Many  other  ways 
of  implementing  attributes  are  possible.  For  example,  in  an  object-oriented  pro¬ 
gramming  language,  vertex  attributes  might  be  represented  as  instance  variables 
within  a  subclass  of  a  Vertex  class. 

Exercises 


22.1-1 

Given  an  adjacency-list  representation  of  a  directed  graph,  how  long  does  it  take 
to  compute  the  out-degree  of  every  vertex?  How  long  does  it  take  to  compute  the 
in-degrees? 


22.1-2 

Give  an  adjacency-list  representation  for  a  complete  binary  tree  on  7  vertices.  Give 
an  equivalent  adjacency-matrix  representation.  Assume  that  vertices  are  numbered 
from  1  to  7  as  in  a  binary  heap. 


22.1-3 

The  transpose  of  a  directed  graph  G  =  (V,  E)  is  the  graph  GT  =  (F,  ET),  where 
ET  =  {(v,  u)  €  V  x  V  :  (u,  v )  €  Ej.  Thus,  GT  is  G  with  all  its  edges  reversed. 
Describe  efficient  algorithms  for  computing  GT  from  G,  for  both  the  adjacency- 
list  and  adjacency-matrix  representations  of  G.  Analyze  the  running  times  of  your 
algorithms. 
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22.1-4 

Given  an  adjacency-list  representation  of  a  multigraph  G  =  (V,  E),  describe  an 
0(V  +  ffj-time  algorithm  to  compute  the  adjacency-list  representation  of  the 
“equivalent”  undirected  graph  G'  =  (V.  E'),  where  E'  consists  of  the  edges  in  E 
with  all  multiple  edges  between  two  vertices  replaced  by  a  single  edge  and  with  all 
self-loops  removed. 


22.1-5 

The  square  of  a  directed  graph  G  =  (V,  E)  is  the  graph  G2  =  (V,  E2)  such  that 
(w,  v)  e  E2  if  and  only  G  contains  a  path  with  at  most  two  edges  between  u  and  v. 
Describe  efficient  algorithms  for  computing  G2  from  G  for  both  the  adjacency- 
list  and  adjacency-matrix  representations  of  G.  Analyze  the  running  times  of  your 
algorithms. 


22.1-6 

Most  graph  algorithms  that  take  an  adjacency-matrix  representation  as  input  re¬ 
quire  time  f?(F2),  but  there  are  some  exceptions.  Show  how  to  determine  whether 
a  directed  graph  G  contains  a  universal  sink— a  vertex  with  in-degree  |  V\  —  1  and 
out-degree  0— in  time  O(V),  given  an  adjacency  matrix  for  G. 


22.1-7 

The  incidence  matrix  of  a  directed  graph  G  =  (V,  E)  with  no  self-loops  is  a 
\V\  x  \E  \  matrix  B  =  (b^ )  such  that 


bU 


—  1  if  edge  j  leaves  vertex  i  , 
1  if  edge  j  enters  vertex  i  , 
0  otherwise  . 


Describe  what  the  entries  of  the  matrix  product  BBT  represent,  where  BT  is  the 
transpose  of  B. 


22.1-8 

Suppose  that  instead  of  a  linked  list,  each  array  entry  Adj[u]  is  a  hash  table  contain¬ 
ing  the  vertices  v  for  which  (u,v)  €  E.  If  all  edge  lookups  are  equally  likely,  what 
is  the  expected  time  to  determine  whether  an  edge  is  in  the  graph?  What  disadvan¬ 
tages  does  this  scheme  have?  Suggest  an  alternate  data  structure  for  each  edge  list 
that  solves  these  problems.  Does  your  alternative  have  disadvantages  compared  to 
the  hash  table? 
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22.2  Breadth-first  search 

Breadth-first  search  is  one  of  the  simplest  algorithms  for  searching  a  graph  and 
the  archetype  for  many  important  graph  algorithms.  Prim’s  minimum-spanning- 
tree  algorithm  (Section  23.2)  and  Dijkstra’s  single-source  shortest-paths  algorithm 
(Section  24.3)  use  ideas  similar  to  those  in  breadth-first  search. 

Given  a  graph  G  =  (V,  E)  and  a  distinguished  source  vertex  s,  breadth-first 
search  systematically  explores  the  edges  of  G  to  “discover”  every  vertex  that  is 
reachable  from  s.  It  computes  the  distance  (smallest  number  of  edges)  from  s 
to  each  reachable  vertex.  It  also  produces  a  “breadth-first  tree”  with  root  s  that 
contains  all  reachable  vertices.  For  any  vertex  v  reachable  from  s,  the  simple  path 
in  the  breadth-first  tree  from  s  to  v  corresponds  to  a  “shortest  path”  from  s  to  v 
in  G,  that  is,  a  path  containing  the  smallest  number  of  edges.  The  algorithm  works 
on  both  directed  and  undirected  graphs. 

Breadth-first  search  is  so  named  because  it  expands  the  frontier  between  discov¬ 
ered  and  undiscovered  vertices  uniformly  across  the  breadth  of  the  frontier.  That 
is,  the  algorithm  discovers  all  vertices  at  distance  k  from  s  before  discovering  any 
vertices  at  distance  k  +  1 . 

To  keep  track  of  progress,  breadth-first  search  colors  each  vertex  white,  gray,  or 
black.  All  vertices  staid  out  white  and  may  later  become  gray  and  then  black.  A 
vertex  is  discovered  the  first  time  it  is  encountered  during  the  search,  at  which  time 
it  becomes  nonwhite.  Gray  and  black  vertices,  therefore,  have  been  discovered,  but 
breadth-first  search  distinguishes  between  them  to  ensure  that  the  search  proceeds 
in  a  breadth-first  manner.1  If  (u,v)  €  E  and  vertex  u  is  black,  then  vertex  v 
is  either  gray  or  black;  that  is,  all  vertices  adjacent  to  black  vertices  have  been 
discovered.  Gray  vertices  may  have  some  adjacent  white  vertices;  they  represent 
the  frontier  between  discovered  and  undiscovered  vertices. 

Breadth-first  search  constructs  a  breadth-first  tree,  initially  containing  only  its 
root,  which  is  the  source  vertex  s.  Whenever  the  search  discovers  a  white  vertex  v 
in  the  course  of  scanning  the  adjacency  list  of  an  already  discovered  vertex  u,  the 
vertex  v  and  the  edge  (u.  v)  are  added  to  the  tree.  We  say  that  u  is  the  predecessor 
or  parent  of  v  in  the  breadth-first  tree.  Since  a  vertex  is  discovered  at  most  once,  it 
has  at  most  one  parent.  Ancestor  and  descendant  relationships  in  the  breadth-first 
tree  are  defined  relative  to  the  root  s  as  usual:  if  u  is  on  the  simple  path  in  the  tree 
from  the  root  s  to  vertex  v,  then  u  is  an  ancestor  of  v  and  v  is  a  descendant  of  u. 


1We  distinguish  between  gray  and  black  vertices  to  help  us  understand  how  breadth  first  search  op 
erates.  In  fact,  as  Exercise  22.2  3  shows,  we  would  get  the  same  result  even  if  we  did  not  distinguish 
between  gray  and  black  vertices. 
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The  breadth-first-search  procedure  BFS  below  assumes  that  the  input  graph 
G  =  {V,  E)  is  represented  using  adjacency  lists.  It  attaches  several  additional 
attributes  to  each  vertex  in  the  graph.  We  store  the  color  of  each  vertex  u  €  V 
in  the  attribute  u . color  and  the  predecessor  of  it  in  the  attribute  u.n .  If  u  has  no 
predecessor  (for  example,  if  u  =  s  or  u  has  not  been  discovered),  then  u.n  =  NIL. 
The  attribute  u.d  holds  the  distance  from  the  source  s  to  vertex  u  computed  by  the 
algorithm.  The  algorithm  also  uses  a  first-in,  first-out  queue  Q  (see  Section  10.1) 
to  manage  the  set  of  gray  vertices. 

BFS(G,s) 

1  for  each  vertex  u  e  G.V  —  {s} 

2  u.  color  =  WHITE 

3  u.d  =  oo 

4  u.n  =  NIL 

5  s.  color  =  GRAY 

6  s.d  =  0 

7  s.n  =  NIL 

8  Q  =  0 

9  Enqueue  (Q,  s) 

10  while 

11  u  =  Dequeue)  Q) 

12  for  each  v  e  G.Adj[u] 

13  if  v.  color  ==  WHITE 

14  v.  color  =  GRAY 

15  v.d  =  u.d  +  1 

16  v.n  =  u 

17  Enqueue  (Q.v) 

18  u.  color  =  BLACK 

Figure  22.3  illustrates  the  progress  of  BFS  on  a  sample  graph. 

The  procedure  BFS  works  as  follows.  With  the  exception  of  the  source  vertex  s, 
lines  1^1  paint  every  vertex  white,  set  u.d  to  be  infinity  for  each  vertex  u,  and  set 
the  parent  of  every  vertex  to  be  NIL.  Fine  5  paints  s  gray,  since  we  consider  it  to  be 
discovered  as  the  procedure  begins.  Fine  6  initializes  s.d  to  0,  and  line  7  sets  the 
predecessor  of  the  source  to  be  NIL.  Fines  8-9  initialize  Q  to  the  queue  containing 
just  the  vertex  s. 

The  while  loop  of  lines  10-18  iterates  as  long  as  there  remain  gray  vertices, 
which  are  discovered  vertices  that  have  not  yet  had  their  adjacency  lists  fully  ex¬ 
amined.  This  while  loop  maintains  the  following  invariant: 

At  the  test  in  line  10,  the  queue  Q  consists  of  the  set  of  gray  vertices. 
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Figure  22.3  The  operation  of  BFS  on  an  undirected  graph.  Tree  edges  are  shown  shaded  as  they 
are  produced  by  BFS.  The  value  of  u.d  appears  within  each  vertex  u.  The  queue  Q  is  shown  at  the 
beginning  of  each  iteration  of  the  w  hile  loop  of  lines  10  18.  Vertex  distances  appear  below  vertices 
in  the  queue. 


Although  vve  won’t  use  this  loop  invariant  to  prove  correctness,  it  is  easy  to  see 
that  it  holds  prior  to  the  first  iteration  and  that  each  iteration  of  the  loop  maintains 
the  invariant.  Prior  to  the  first  iteration,  the  only  gray  vertex,  and  the  only  vertex 
in  Q,  is  the  source  vertex  s.  Line  1 1  determines  the  gray  vertex  u  at  the  head  of 
the  queue  Q  and  removes  it  from  Q.  The  for  loop  of  lines  12-17  considers  each 
vertex  v  in  the  adjacency  list  of  u.  If  v  is  white,  then  it  has  not  yet  been  discovered, 
and  the  procedure  discovers  it  by  executing  lines  14-17.  The  procedure  paints 
vertex  v  gray,  sets  its  distance  v.dto  u.d+ 1,  records  u  as  its  parent  v.  n,  and  places 
it  at  the  tail  of  the  queue  Q .  Once  the  procedure  has  examined  all  the  vertices  onw’s 
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adjacency  list,  it  blackens  u  in  line  18.  The  loop  invariant  is  maintained  because 
whenever  a  vertex  is  painted  gray  (in  line  14)  it  is  also  enqueued  (in  line  17),  and 
whenever  a  vertex  is  dequeued  (in  line  1 1)  it  is  also  painted  black  (in  line  18). 

The  results  of  breadth-first  search  may  depend  upon  the  order  in  which  the  neigh¬ 
bors  of  a  given  vertex  are  visited  in  line  12:  the  breadth-first  tree  may  vary,  but  the 
distances  d  computed  by  the  algorithm  will  not.  (See  Exercise  22.2-5.) 

Analysis 

Before  proving  the  various  properties  of  breadth-first  search,  we  take  on  the  some¬ 
what  easier  job  of  analyzing  its  running  time  on  an  input  graph  G  =  (F,  E).  We 
use  aggregate  analysis,  as  we  saw  in  Section  17.1.  After  initialization,  breadth-first 
search  never  whitens  a  vertex,  and  thus  the  test  in  line  13  ensures  that  each  vertex 
is  enqueued  at  most  once,  and  hence  dequeued  at  most  once.  The  operations  of 
enqueuing  and  dequeuing  take  0(1)  time,  and  so  the  total  time  devoted  to  queue 
operations  is  0(F).  Because  the  procedure  scans  the  adjacency  list  of  each  vertex 
only  when  the  vertex  is  dequeued,  it  scans  each  adjacency  list  at  most  once.  Since 
the  sum  of  the  lengths  of  all  the  adjacency  lists  is  &(E),  the  total  time  spent  in 
scanning  adjacency  lists  is  0{E).  The  overhead  for  initialization  is  0(F),  and 
thus  the  total  running  time  of  the  BFS  procedure  is  0(V  +  E).  Thus,  breadth-first 
search  runs  in  time  linear  in  the  size  of  the  adjacency-list  representation  of  G. 

Shortest  paths 

At  the  beginning  of  this  section,  we  claimed  that  breadth-first  search  finds  the  dis¬ 
tance  to  each  reachable  vertex  in  a  graph  G  =  (F,  E)  from  a  given  source  vertex 
.S'  e  F.  Define  the  shortest-path  distance  S(s,  v)  from  s  to  v  as  the  minimum  num¬ 
ber  of  edges  in  any  path  from  vertex  s  to  vertex  v;  if  there  is  no  path  from  s  to  v, 
then  5(5,  v)  =  oo.  We  call  a  path  of  length  S(s,  v)  from  ,v  to  v  a  shortest  path 2 
from  s  to  v.  Before  showing  that  breadth-first  search  correctly  computes  shortest- 
path  distances,  we  investigate  an  important  property  of  shortest-path  distances. 


2  In  Chapters  24  and  25,  we  shall  generalize  our  study  of  shortest  paths  to  weighted  graphs,  in  which 
every  edge  has  a  real  valued  weight  and  the  weight  of  a  path  is  the  sum  of  the  weights  of  its  con 
stituent  edges.  The  graphs  considered  in  the  present  chapter  are  unweighted  or,  equivalently,  all 
edges  have  unit  weight. 
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Lemma  22.1 

Let  G  =  (V,  E)  be  a  directed  or  undirected  graph,  and  let  s  e  V  be  an  arbitrary 
vertex.  Then,  for  any  edge  (u,v)  €  E, 

5(5,  v)  <  5(5,  u)  +  1  . 


Proof  If  u  is  reachable  from  s,  then  so  is  v.  In  this  case,  the  shortest  path  from  s 
to  v  cannot  be  longer  than  the  shortest  path  from  s  to  u  followed  by  the  edge  (u,v), 
and  thus  the  inequality  holds.  If  u  is  not  reachable  from  s,  then  5(5,  u)  =  oo,  and 
the  inequality  holds.  ■ 

We  want  to  show  that  BFS  properly  computes  v.d  =  8{s,  v)  for  each  ver¬ 
tex  ref.  We  first  show  that  v.d  bounds  5(5,  v)  from  above. 

Lemma  22.2 

Let  G  =  (V,  E )  be  a  directed  or  undirected  graph,  and  suppose  that  BFS  is  run 
on  G  from  a  given  source  vertex  s  e  V .  Then  upon  termination,  for  each  ver¬ 
tex  v  €  V,  the  value  v.d  computed  by  BFS  satisfies  v.d  >  5(5,  v). 

Proof  We  use  induction  on  the  number  of  Enqueue  operations.  Our  inductive 
hypothesis  is  that  v.d  >  5(5,  v)  for  all  v  e  V. 

The  basis  of  the  induction  is  the  situation  immediately  after  enqueuing  s  in  line  9 
of  BFS.  The  inductive  hypothesis  holds  here,  because  s.d  =  0  =  5(5,5)  and 
v.d  =  oo  >  5(5,  v)  for  all  v  e  V  —  {5}. 

For  the  inductive  step,  consider  a  white  vertex  v  that  is  discovered  during  the 
search  from  a  vertex  u.  The  inductive  hypothesis  implies  that  u.d  >  5(5,  u ).  From 
the  assignment  performed  by  line  1 5  and  from  Lemma  22. 1 ,  we  obtain 

v.d  =  u.d  +  1 

>  5(5,  u )  +  1 

>  5(5,  v). 

Vertex  v  is  then  enqueued,  and  it  is  never  enqueued  again  because  it  is  also  grayed 
and  the  then  clause  of  lines  14-17  is  executed  only  for  white  vertices.  Thus,  the 
value  of  v.d  never  changes  again,  and  the  inductive  hypothesis  is  maintained.  ■ 

To  prove  that  v.d  =  5(5,  v),  we  must  first  show  more  precisely  how  the  queue  Q 
operates  during  the  course  of  BFS.  The  next  lemma  shows  that  at  all  times,  the 
queue  holds  at  most  two  distinct  d  values. 
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Lemma  22.3 

Suppose  that  during  the  execution  of  BFS  on  a  graph  G  =  (V,  E),  the  queue  Q 
contains  the  vertices  (iq,  v2, . . . ,  vr),  where  iq  is  the  head  of  Q  and  vr  is  the  tail. 
Then,  vr . d  <  v i . d  +  1  and  Vi.d  <  Vj+\.d  for  i  =  1 , 2, . . . ,  r  —  1. 

Proof  The  proof  is  by  induction  on  the  number  of  queue  operations.  Initially, 
when  the  queue  contains  only  s,  the  lemma  certainly  holds. 

For  the  inductive  step,  we  must  prove  that  the  lemma  holds  after  both  dequeuing 
and  enqueuing  a  vertex.  If  the  head  iq  of  the  queue  is  dequeued,  v2  becomes  the 
new  head.  (If  the  queue  becomes  empty,  then  the  lemma  holds  vacuously.)  By  the 
inductive  hypothesis,  V\ .d  <  v2.d.  But  then  we  have  vr.d  <  V\.d+  1  <  v2.d  +  1, 
and  the  remaining  inequalities  are  unaffected.  Thus,  the  lemma  follows  with  v2  as 
the  head. 

In  order  to  understand  what  happens  upon  enqueuing  a  vertex,  we  need  to  ex¬ 
amine  the  code  more  closely.  When  we  enqueue  a  vertex  v  in  line  17  of  BFS,  it 
becomes  vr+i.  At  that  time,  we  have  already  removed  vertex  u,  whose  adjacency 
list  is  currently  being  scanned,  from  the  queue  Q,  and  by  the  inductive  hypothesis, 
the  new  head  iq  has  lq .d>  u.d.  Thus,  vr+1.d  =  v.d  =  u.d+  1  <  V\.d+ 1.  From 
the  inductive  hypothesis,  we  also  have  vr.d  <  u.d  +  1 ,  and  so  vr.d  <  u.d  +  1  = 
v.d  =  vr+\.d,  and  the  remaining  inequalities  are  unaffected.  Thus,  the  lemma 
follows  when  v  is  enqueued.  ■ 

The  following  corollary  shows  that  the  d  values  at  the  time  that  vertices  are 
enqueued  are  monotonically  increasing  over  time. 

Corollary  22.4 

Suppose  that  vertices  v,  and  Vj  are  enqueued  during  the  execution  of  BFS,  and 
that  Vi  is  enqueued  before  Vj .  Then  vt.d  <  Vj . d  at  the  time  that  Vj  is  enqueued. 

Proof  Immediate  from  Lemma  22.3  and  the  property  that  each  vertex  receives  a 
finite  d  value  at  most  once  during  the  course  of  BFS.  ■ 

We  can  now  prove  that  breadth-first  search  correctly  finds  shortest-path  dis¬ 
tances. 

Theorem  22.5  (Correctness  of  breadth-first  search) 

Let  G  =  (V,  E)  be  a  directed  or  undirected  graph,  and  suppose  that  BFS  is  run 
on  G  from  a  given  source  vertex  s  €  V.  Then,  during  its  execution,  BFS  discovers 
every  vertex  v  €  V  that  is  reachable  from  the  source  s,  and  upon  termination, 
v.d  =  S(s.  v)  for  all  i>  e  V.  Moreover,  for  any  vertex  r  /  i  that  is  reachable 
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from  s,  one  of  the  shortest  paths  from  s  to  v  is  a  shortest  path  from  s  to  v.jt 
followed  by  the  edge  (v.tt,  v). 

Proof  Assume,  for  the  puipose  of  contradiction,  that  some  vertex  receives  a  d 
value  not  equal  to  its  shortest-path  distance.  Let  v  be  the  vertex  with  min¬ 
imum  8(s,  v)  that  receives  such  an  incorrect  d  value;  clearly  v  s.  By 
Lemma  22.2,  v.d  >  8(s,  v),  and  thus  we  have  that  v.d  >  8(s,  u).  Vertex  v  must  be 
reachable  from  s,  for  if  it  is  not,  then  8(s ,  v)  =  oo  >  v.d.  Let  u  be  the  vertex  im¬ 
mediately  preceding  v  on  a  shortest  path  from  .v  to  v,  so  that  8(s,  v)  =  8(s.  u)  +  I . 
Because  <5(.v,  u)  <  S(s,  v),  and  because  of  how  we  chose  v,wehaven.J  =  S(s,  u). 
Putting  these  properties  together,  we  have 

v.d  >  8(s ,  v)  =  8(s,  u)  +  1  =  u.d  +  1  .  (22.1) 

Now  consider  the  time  when  BFS  chooses  to  dequeue  vertex  u  from  Q  in 
line  11.  At  this  time,  vertex  v  is  either  white,  gray,  or  black.  We  shall  show 
that  in  each  of  these  cases,  we  derive  a  contradiction  to  inequality  (22.1).  If  v  is 
white,  then  line  15  sets  v.d  =  u.d  +  1,  contradicting  inequality  (22.1).  If  v  is 
black,  then  it  was  already  removed  from  the  queue  and,  by  Corollary  22.4,  we  have 
v.d  <  u.d,  again  contradicting  inequality  (22.1).  If  v  is  gray,  then  it  was  painted 
gray  upon  dequeuing  some  vertex  w,  which  was  removed  from  Q  earlier  than  u 
and  for  which  v.d  —  w.d+  1.  By  Corollary  22.4,  however,  w.d  <  u.d,  and  so  we 
have  v.d  =  w.d  +  1  <  u.d  +  1,  once  again  contradicting  inequality  (22.1). 

Thus  we  conclude  that  v.d  =  8(s,  v)  for  all  v  e  V.  All  vertices  v  reachable 
from  s  must  be  discovered,  for  otherwise  they  would  have  oo  =  v.d  >  8(s,  v).  To 
conclude  the  proof  of  the  theorem,  observe  that  if  v.tt  =  u,  then  v.d  =  u.d  +  1. 
Thus,  we  can  obtain  a  shortest  path  from  ,v  to  v  by  taking  a  shortest  path  from  s 
to  v.tt  and  then  traversing  the  edge  (v.jz,  v). 


Breadth-first  trees 

The  procedure  BFS  builds  a  breadth-first  tree  as  it  searches  the  graph,  as  Fig¬ 
ure  22.3  illustrates.  The  tree  corresponds  to  the  n  attributes.  More  formally,  for 
a  graph  G  =  (V,  E)  with  source  s,  we  define  the  predecessor  subgraph  of  G  as 
Gn  =  (Vn,En),  where 

Vn  =  {v  e  V  :  v.tt  nil}  U  {5} 

and 

En  =  {(v.jt,  v)  :  v  e  Vn  -  {s}}  . 

The  predecessor  subgraph  Gn  is  a  breadth-first  tree  if  Vn  consists  of  the  vertices 
reachable  from  s  and,  for  all  v  €  Vn,  the  subgraph  Gn  contains  a  unique  simple 
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path  from  .s'  to  v  that  is  also  a  shortest  path  from  s  to  v  in  G.  A  breadth-first  tree 
is  in  fact  a  tree,  since  it  is  connected  and  \E„\  =  \Vn\  —  1  (see  Theorem  B.2).  We 
call  the  edges  in  En  tree  edges. 

The  following  lemma  shows  that  the  predecessor  subgraph  produced  by  the  BFS 
procedure  is  a  breadth-first  tree. 

Lemma  22.6 

When  applied  to  a  directed  or  undirected  graph  G  =  (  V.  E),  procedure  BFS  con¬ 
structs  n  so  that  the  predecessor  subgraph  Gn  =  (Vn,  En)  is  a  breadth-first  tree. 

Proof  Line  16  of  BFS  sets  v.n  =  u  if  and  only  if  ( u ,  v)  €  E  and  8(s ,  v)  <  oo— 
that  is,  if  v  is  reachable  from  s  —  and  thus  Vn  consists  of  the  vertices  in  V  reachable 
from  s.  Since  Gn  forms  a  tree,  by  Theorem  B.2,  it  contains  a  unique  simple  path 
from  s  to  each  vertex  in  Vn.  By  applying  Theorem  22.5  inductively,  we  conclude 
that  every  such  path  is  a  shortest  path  in  G.  ■ 

The  following  procedure  prints  out  the  vertices  on  a  shortest  path  from  i  to  r, 
assuming  that  BFS  has  already  computed  a  breadth-first  tree: 

Print-Path(G,s,  v) 

1  if  v  ==  s 

2  print  s 

3  elseif  v.n==  nil 

4  print  “no  path  from”  s  “to”  v  “exists” 

5  else  Print- Path (G,  5,  v. n) 

6  print  v 

This  procedure  runs  in  time  linear  in  the  number  of  vertices  in  the  path  printed, 
since  each  recursive  call  is  for  a  path  one  vertex  shorter. 

Exercises 


22.2-1 

Show  the  d  and  n  values  that  result  from  running  breadth-first  search  on  the  di¬ 
rected  graph  of  Figure  22.2(a),  using  vertex  3  as  the  source. 


22.2-2 

Show  the  d  and  n  values  that  result  from  running  breadth-first  search  on  the  undi¬ 
rected  graph  of  Figure  22.3,  using  vertex  u  as  the  source. 
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22.2-3 

Show  that  using  a  single  bit  to  store  each  vertex  color  suffices  by  arguing  that  the 
BFS  procedure  would  produce  the  same  result  if  lines  5  and  14  were  removed. 


22.2-4 

What  is  the  running  time  of  BFS  if  we  represent  its  input  graph  by  an  adjacency 
matrix  and  modify  the  algorithm  to  handle  this  form  of  input? 


22.2-5 

Argue  that  in  a  breadth-first  search,  the  value  u .  d  assigned  to  a  vertex  u  is  inde¬ 
pendent  of  the  order  in  which  the  vertices  appear  in  each  adjacency  list.  Using 
Figure  22.3  as  an  example,  show  that  the  breadth-first  tree  computed  by  BFS  can 
depend  on  the  ordering  within  adjacency  lists. 


22.2-6 

Give  an  example  of  a  directed  graph  G  =  (V.E),  a  source  vertex  s  e  V,  and  a 
set  of  tree  edges  E„  C  E  such  that  for  each  vertex  v  s  V,  the  unique  simple  path 
in  the  graph  (V,  E„)  from  s  to  v  is  a  shortest  path  in  G,  yet  the  set  of  edges  E„ 
cannot  be  produced  by  running  BFS  on  G,  no  matter  how  the  vertices  are  ordered 
in  each  adjacency  list. 


22.2- 7 

There  are  two  types  of  professional  wrestlers:  “babyfaces”  (“good  guys”)  and 
“heels”  (“bad  guys”).  Between  any  pair  of  professional  wrestlers,  there  may  or 
may  not  be  a  rivalry.  Suppose  we  have  n  professional  wrestlers  and  we  have  a  list 
of  r  pairs  of  wrestlers  for  which  there  are  rivalries.  Give  an  0(n  +  r)-time  algo¬ 
rithm  that  determines  whether  it  is  possible  to  designate  some  of  the  wrestlers  as 
babyfaces  and  the  remainder  as  heels  such  that  each  rivalry  is  between  a  babyface 
and  a  heel.  If  it  is  possible  to  perform  such  a  designation,  your  algorithm  should 
produce  it. 

22.2- 8  * 

The  diameter  of  a  tree  T  =  (V.E)  is  defined  as  max„iVel /8(u,v),  that  is,  the 
largest  of  all  shortest-path  distances  in  the  tree.  Give  an  efficient  algorithm  to 
compute  the  diameter  of  a  tree,  and  analyze  the  running  time  of  your  algorithm. 


22.2-9 

Let  G  =  ( V ,  E )  be  a  connected,  undirected  graph.  Give  an  0(V  +  E)- time  algo¬ 
rithm  to  compute  a  path  in  G  that  traverses  each  edge  in  E  exactly  once  in  each 
direction.  Describe  how  you  can  find  your  way  out  of  a  maze  if  you  are  given  a 
large  supply  of  pennies. 
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22.3  Depth-first  search 

The  strategy  followed  by  depth-first  search  is,  as  its  name  implies,  to  search 
“deeper”  in  the  graph  whenever  possible.  Depth-first  search  explores  edges  out 
of  the  most  recently  discovered  vertex  v  that  still  has  unexplored  edges  leaving  it. 
Once  all  of  v ’s  edges  have  been  explored,  the  search  “backtracks”  to  explore  edges 
leaving  the  vertex  from  which  v  was  discovered.  This  process  continues  until  we 
have  discovered  all  the  vertices  that  are  reachable  from  the  original  source  vertex. 
If  any  undiscovered  vertices  remain,  then  depth-first  search  selects  one  of  them  as 
a  new  source,  and  it  repeats  the  search  from  that  source.  The  algorithm  repeats  this 
entire  process  until  it  has  discovered  every  vertex.3 

As  in  breadth-first  search,  whenever  depth-first  search  discovers  a  vertex  v  dur¬ 
ing  a  scan  of  the  adjacency  list  of  an  already  discovered  vertex  n,  it  records  this 
event  by  setting  v’s  predecessor  attribute  v.n  to  u.  Unlike  breadth-first  search, 
whose  predecessor  subgraph  forms  a  tree,  the  predecessor  subgraph  produced  by 
a  depth-first  search  may  be  composed  of  several  trees,  because  the  search  may 
repeat  from  multiple  sources.  Therefore,  we  define  the  predecessor  subgraph  of 
a  depth-first  search  slightly  differently  from  that  of  a  breadth-first  search:  we  let 
Gn  =  (  V,  En),  where 

Et r  =  {{v.n.  v)  :  v  e  V  and  v.n  NIL}  . 

The  predecessor  subgraph  of  a  depth-first  search  forms  a  depth-first  forest  com¬ 
prising  several  depth-first  trees.  The  edges  in  En  are  tree  edges. 

As  in  breadth-first  search,  depth-first  search  colors  vertices  during  the  search  to 
indicate  their  state.  Each  vertex  is  initially  white,  is  grayed  when  it  is  discovered 
in  the  search,  and  is  blackened  when  it  is  finished ,  that  is,  when  its  adjacency  list 
has  been  examined  completely.  This  technique  guarantees  that  each  vertex  ends  up 
in  exactly  one  depth-first  tree,  so  that  these  trees  are  disjoint. 

Besides  creating  a  depth-first  forest,  depth-first  search  also  timestamps  each  ver¬ 
tex.  Each  vertex  v  has  two  timestamps:  the  first  timestamp  v.d  records  when  v 
is  first  discovered  (and  grayed),  and  the  second  timestamp  v.f  records  when  the 
search  finishes  examining  v’s  adjacency  list  (and  blackens  v).  These  timestamps 


3  It  may  seem  arbitrary  that  breadth  first  search  is  limited  to  only  one  source  whereas  depth  first 
search  may  search  from  multiple  sources.  Although  conceptually,  breadth  first  search  could  proceed 
from  multiple  sources  and  depth  first  search  could  be  limited  to  one  source,  our  approach  reflects  how 
the  results  of  these  searches  are  typically  used.  Breadth  first  search  usually  serves  to  find  shortest 
path  distances  (and  the  associated  predecessor  subgraph)  from  a  given  source.  Depth  first  search  is 
often  a  subroutine  in  another  algorithm,  as  we  shall  see  later  in  this  chapter. 
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provide  important  information  about  the  structure  of  the  graph  and  are  generally 
helpful  in  reasoning  about  the  behavior  of  depth-first  search. 

The  procedure  DFS  below  records  when  it  discovers  vertex  u  in  the  attribute  u.d 
and  when  it  finishes  vertex  u  in  the  attribute  u.f.  These  timestamps  are  integers 
between  1  and  2  |  V  \ ,  since  there  is  one  discovery  event  and  one  finishing  event  for 
each  of  the  \  V\  vertices.  For  every  vertex  u, 

u.d  <  u.f.  (22.2) 

Vertex  u  is  WHITE  before  time  u.d,  GRAY  between  time  u.d  and  time  u.f,  and 
BLACK  thereafter. 

The  following  pseudocode  is  the  basic  depth-first-search  algorithm.  The  input 
graph  G  may  be  undirected  or  directed.  The  variable  time  is  a  global  variable  that 
we  use  for  timestamping. 

DFS(G) 

1  for  each  vertex  u  e  G.V 

2  u.  color  =  WHITE 

3  u.n  =  nil 

4  time  =  0 

5  for  each  vertex  u  e  G.V 

6  if  u.  color  ==  WHITE 

7  DFS-Visit(G,  u) 

DFS-Visit(G,  u) 

1  time  =  time  +  1  //  white  vertex  u  has  just  been  discovered 

2  u.d  =  time 

3  u.  color  =  GRAY 

4  for  each  v  e  G.Adj[u\  //  explore  edge  (u,  v) 

5  if  v.  color  ==  WHITE 

6  v.n  =  u 

7  DFS-Visit(G,  v) 

8  u. color  =  BLACK  //  blacken  m;  it  is  finished 

9  time  =  time  +  1 
10  u.f  =  time 

Figure  22.4  illustrates  the  progress  of  DFS  on  the  graph  shown  in  Figure  22.2. 

Procedure  DFS  works  as  follows.  Lines  1-3  paint  all  vertices  white  and  ini¬ 
tialize  their  n  attributes  to  NIL.  Line  4  resets  the  global  time  counter.  Lines  5-7 
check  each  vertex  in  V  in  turn  and,  when  a  white  vertex  is  found,  visit  it  using 
DFS-VlSlT.  Every  time  DFS-Visit(G,  m)  is  called  in  line  7,  vertex  u  becomes 
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Figure  22.4  The  progress  of  the  depth  first  search  algorithm  DFS  on  a  directed  graph.  As  edges 
are  explored  by  the  algorithm,  they  are  shown  as  either  shaded  (if  they  are  tree  edges)  or  dashed 
(otherwise).  Nontree  edges  are  labeled  B,  C,  or  F  according  to  whether  they  are  back,  cross,  or 
forward  edges.  Timestamps  within  vertices  indicate  discovery  time/finishing  times. 

the  root  of  a  new  tree  in  the  depth-first  forest.  When  DFS  returns,  every  vertex  u 
has  been  assigned  a  discovery  time  u.d  and  a  finishing  time  u.f. 

In  each  call  DFS-VlSIT(G,  u),  vertex  w  is  initially  white.  Line  1  increments 
the  global  variable  time ,  line  2  records  the  new  value  of  time  as  the  discovery 
time  u.d,  and  line  3  paints  u  gray.  Lines  4-7  examine  each  vertex  v  adjacent  to  u 
and  recursively  visit  v  if  it  is  white.  As  each  vertex  v  €  Adj[u ]  is  considered  in 
line  4,  we  say  that  edge  (w,  v)  is  explored  by  the  depth-first  search.  Finally,  after 
every  edge  leaving  u  has  been  explored,  lines  8-10  paint  u  black,  increment  time, 
and  record  the  finishing  time  in  u.f. 

Note  that  the  results  of  depth-first  search  may  depend  upon  the  order  in  which 
line  5  of  DFS  examines  the  vertices  and  upon  the  order  in  which  line  4  of  DFS- 
VlSlT  visits  the  neighbors  of  a  vertex.  These  different  visitation  orders  tend  not 
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to  cause  problems  in  practice,  as  we  can  usually  use  any  depth-first  search  result 
effectively,  with  essentially  equivalent  results. 

What  is  the  running  time  of  DFS?  The  loops  on  lines  1-3  and  lines  5-7  of  DFS 
take  time  0(F),  exclusive  of  the  time  to  execute  the  calls  to  DFS-VlSlT.  As  we  did 
for  breadth-first  search,  we  use  aggregate  analysis.  The  procedure  DFS-Visit  is 
called  exactly  once  for  each  vertex  ref,  since  the  vertex  u  on  which  DFS-VlSlT 
is  invoked  must  be  white  and  the  first  thing  DFS-Visit  does  is  paint  vertex  u  gray. 
During  an  execution  of  DFS-Visit(G,  v),  the  loop  on  lines  4-7  executes  |/W/[t'] | 
times.  Since 

J2\Adj[v}\  =  ®(E)  , 

veV 

the  total  cost  of  executing  lines  4-7  of  DFS-VlSlT  is  <d(E).  The  running  time  of 
DFS  is  therefore  0(F  +  E). 

Properties  of  depth-first  search 

Depth-first  search  yields  valuable  information  about  the  structure  of  a  graph.  Per¬ 
haps  the  most  basic  property  of  depth-first  search  is  that  the  predecessor  sub¬ 
graph  Gjt  does  indeed  form  a  forest  of  trees,  since  the  structure  of  the  depth- 
first  trees  exactly  mirrors  the  structure  of  recursive  calls  of  DFS-Visit.  That  is, 
u  =  v.n  if  and  only  if  DFS-Visit(G,  v)  was  called  during  a  search  of  n’s  ad¬ 
jacency  list.  Additionally,  vertex  v  is  a  descendant  of  vertex  u  in  the  depth-first 
forest  if  and  only  if  v  is  discovered  during  the  time  in  which  u  is  gray. 

Another  important  property  of  depth-first  search  is  that  discovery  and  finishing 
times  have  parenthesis  structure.  If  we  represent  the  discovery  of  vertex  u  with 
a  left  parenthesis  “(w”  and  represent  its  finishing  by  a  right  parenthesis  “m)”,  then 
the  history  of  discoveries  and  finishings  makes  a  well-formed  expression  in  the 
sense  that  the  parentheses  are  properly  nested.  For  example,  the  depth-first  search 
of  Figure  22.5(a)  corresponds  to  the  parenthesization  shown  in  Figure  22.5(b).  The 
following  theorem  provides  another  way  to  characterize  the  parenthesis  structure. 

Theorem  22.7  (Parenthesis  theorem ) 

In  any  depth-first  search  of  a  (directed  or  undirected)  graph  G  =  (V,  E),  for  any 
two  vertices  u  and  v,  exactly  one  of  the  following  three  conditions  holds: 

•  the  intervals  [u.d,  u.f]  and  [ v.d ,  v.f]  are  entirely  disjoint,  and  neither  it  nor  v 
is  a  descendant  of  the  other  in  the  depth-first  forest, 

•  the  interval  [u.d,  u.f]  is  contained  entirely  within  the  interval  [v.d,  v.f],  and  u 
is  a  descendant  of  v  in  a  depth-first  tree,  or 

•  the  interval  [v.d,  v.f]  is  contained  entirely  within  the  interval  [u.d,  u.f],  and  v 
is  a  descendant  of  u  in  a  depth-first  tree. 
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Figure  22.5  Properties  of  depth  first  search,  (a)  The  result  of  a  depth  first  search  of  a  directed 
graph.  Vertices  are  timestamped  and  edge  types  are  indicated  as  in  Figure  22.4.  (b)  Intervals  for 
the  discovery  time  and  finishing  time  of  each  vertex  correspond  to  the  parenthesization  shown.  Each 
rectangle  spans  the  interval  given  by  the  discovery  and  finishing  times  of  the  corresponding  vertex. 
Only  tree  edges  are  shown.  If  two  intervals  overlap,  then  one  is  nested  within  the  other,  and  the 
vertex  corresponding  to  the  smaller  interval  is  a  descendant  of  the  vertex  corresponding  to  the  larger, 
(c)  The  graph  of  part  (a)  redrawn  with  all  tree  and  forward  edges  going  down  within  a  depth  first  tree 
and  all  back  edges  going  up  from  a  descendant  to  an  ancestor. 
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Proof  We  begin  with  the  case  in  which  u.d  <  v.d.  We  consider  two  subcases, 
according  to  whether  v.d  <  u.f  or  not.  The  first  subcase  occurs  when  v.  d  <  u.f, 
so  v  was  discovered  while  u  was  still  gray,  which  implies  that  v  is  a  descendant 
of  u.  Moreover,  since  v  was  discovered  more  recently  than  u,  all  of  its  outgo¬ 
ing  edges  are  explored,  and  v  is  finished,  before  the  search  returns  to  and  fin¬ 
ishes  u.  In  this  case,  therefore,  the  interval  [v.d,  v.J]  is  entirely  contained  within 
the  interval  [u.d,  u.f].  In  the  other  subcase,  u.f  <  v.d,  and  by  inequality  (22.2), 
u.d  <  u.f  <  v.d  <  v.f',  thus  the  intervals  [u.d,  u.f]  and  [v.d,  v.J]  are  disjoint. 
Because  the  intervals  are  disjoint,  neither  vertex  was  discovered  while  the  other 
was  gray,  and  so  neither  vertex  is  a  descendant  of  the  other. 

The  case  in  which  v.d  <  u.d  is  similar,  with  the  roles  of  u  and  v  reversed  in  the 
above  argument.  ■ 

Corollary  22.8  ( Nesting  of  descendants’  intervals ) 

Vertex  v  is  a  proper  descendant  of  vertex  u  in  the  depth-first  forest  for  a  (directed 
or  undirected)  graph  G  if  and  only  if  u.d  <  v.d  <  v.f  <  u.f . 

Proof  Immediate  from  Theorem  22.7.  ■ 

The  next  theorem  gives  another  important  characterization  of  when  one  vertex 
is  a  descendant  of  another  in  the  depth-first  forest. 

Theorem  22.9  (White-path  theorem) 

In  a  depth-first  forest  of  a  (directed  or  undirected)  graph  G  =  (V,  E),  vertex  v  is 
a  descendant  of  vertex  u  if  and  only  if  at  the  time  u.d  that  the  search  discovers  u, 
there  is  a  path  from  u  to  v  consisting  entirely  of  white  vertices. 

Proof  =>-:  If  v  —  u,  then  the  path  from  u  to  v  contains  just  vertex  u,  which  is  still 
white  when  we  set  the  value  of  u.d.  Now,  suppose  that  v  is  a  proper  descendant 
of  u  in  the  depth-first  forest.  By  Corollary  22.8,  u.d  <  v.d,  and  so  v  is  white  at 
time  u.d.  Since  v  can  be  any  descendant  of  u,  all  vertices  on  the  unique  simple 
path  from  u  to  v  in  the  depth-first  forest  are  white  at  time  u . d. 

<=:  Suppose  that  there  is  a  path  of  white  vertices  from  u  to  v  at  time  u.d,  but  v 
does  not  become  a  descendant  of  u  in  the  depth-first  tree.  Without  loss  of  general¬ 
ity,  assume  that  eve  17  vertex  other  than  v  along  the  path  becomes  a  descendant  of  u. 
(Otherwise,  let  v  be  the  closest  vertex  to  u  along  the  path  that  doesn’t  become  a  de¬ 
scendant  of  m.)  Let  w  be  the  predecessor  of  v  in  the  path,  so  that  w  is  a  descendant 
of  u  (w  and  u  may  in  fact  be  the  same  vertex).  By  Corollary  22.8,  w.f  <  u.f.  Be¬ 
cause  v  must  be  discovered  after  u  is  discovered,  but  before  w  is  finished,  we  have 
u.d  <  v.d  <  w.f  <  u.f.  Theorem  22.7  then  implies  that  the  interval  [v.d,  v.f] 
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is  contained  entirely  within  the  interval  [u.d.  u.f\.  By  Corollary  22.8,  v  must  after 
all  be  a  descendant  of  u.  m 

Classification  of  edges 

Another  interesting  property  of  depth-first  search  is  that  the  search  can  be  used 
to  classify  the  edges  of  the  input  graph  G  =  (V.  E).  The  type  of  each  edge  can 
provide  important  information  about  a  graph.  For  example,  in  the  next  section,  we 
shall  see  that  a  directed  graph  is  acyclic  if  and  only  if  a  depth-first  search  yields  no 
“back”  edges  (Lemma  22.11). 

We  can  define  four  edge  types  in  terms  of  the  depth-first  forest  Gn  produced  by 
a  depth-first  search  on  G : 

1.  Tree  edges  are  edges  in  the  depth-first  forest  Gn.  Edge  (u,  u)  is  a  tree  edge  if  v 
was  first  discovered  by  exploring  edge  (u,  v). 

2.  Back  edges  are  those  edges  (u,  v)  connecting  a  vertex  u  to  an  ancestor  v  in  a 
depth-first  tree.  We  consider  self-loops,  which  may  occur  in  directed  graphs,  to 
be  back  edges. 

3.  Forward  edges  are  those  nontree  edges  (u,  v)  connecting  a  vertex  u  to  a  de¬ 
scendant  v  in  a  depth-first  tree. 

4.  Cross  edges  are  all  other  edges.  They  can  go  between  vertices  in  the  same 
depth-first  tree,  as  long  as  one  vertex  is  not  an  ancestor  of  the  other,  or  they  can 
go  between  vertices  in  different  depth-first  trees. 

In  Figures  22.4  and  22.5,  edge  labels  indicate  edge  types.  Figure  22.5(c)  also  shows 
how  to  redraw  the  graph  of  Figure  22.5(a)  so  that  all  tree  and  forward  edges  head 
downward  in  a  depth-first  tree  and  all  back  edges  go  up.  We  can  redraw  any  graph 
in  this  fashion. 

The  DFS  algorithm  has  enough  information  to  classify  some  edges  as  it  encoun¬ 
ters  them.  The  key  idea  is  that  when  we  first  explore  an  edge  (n,  v),  the  color  of 
vertex  v  tells  us  something  about  the  edge: 

1.  WHITE  indicates  a  tree  edge, 

2.  GRAY  indicates  a  back  edge,  and 

3.  BLACK  indicates  a  forward  or  cross  edge. 

The  first  case  is  immediate  from  the  specification  of  the  algorithm.  For  the  sec¬ 
ond  case,  observe  that  the  gray  vertices  always  form  a  linear  chain  of  descendants 
corresponding  to  the  stack  of  active  DFS -VISIT  invocations;  the  number  of  gray 
vertices  is  one  more  than  the  depth  in  the  depth-first  forest  of  the  vertex  most  re¬ 
cently  discovered.  Exploration  always  proceeds  from  the  deepest  gray  vertex,  so 
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an  edge  that  reaches  another  gray  vertex  has  reached  an  ancestor.  The  third  case 
handles  the  remaining  possibility;  Exercise  22.3-5  asks  you  to  show  that  such  an 
edge  (m,  v)  is  a  forward  edge  if  u. d  <  v.d  and  a  cross  edge  if  u.d  >  v.d. 

An  undirected  graph  may  entail  some  ambiguity  in  how  we  classify  edges, 
since  (u,  v)  and  ( v ,  u )  are  really  the  same  edge.  In  such  a  case,  we  classify  the 
edge  as  the  first  type  in  the  classification  list  that  applies.  Equivalently  (see  Ex¬ 
ercise  22.3-6),  we  classify  the  edge  according  to  whichever  of  ( u ,  v)  or  (v.  u)  the 
search  encounters  first. 

We  now  show  that  forward  and  cross  edges  never  occur  in  a  depth-first  search  of 
an  undirected  graph. 

Theorem  22.10 

In  a  depth-first  search  of  an  undirected  graph  G,  every  edge  of  G  is  either  a  tree 
edge  or  a  back  edge. 

Proof  Let  (w,  v)  be  an  arbitrary  edge  of  G,  and  suppose  without  loss  of  generality 
that  u.d  <  v.d.  Then  the  search  must  discover  and  finish  v  before  it  finishes  u 
(while  u  is  gray),  since  v  is  on  u’ s  adjacency  list.  If  the  first  time  that  the  search 
explores  edge  (u.v),  it  is  in  the  direction  from  u  to  v,  then  v  is  undiscovered 
(white)  until  that  time,  for  otherwise  the  search  would  have  explored  this  edge 
already  in  the  direction  from  v  to  u.  Thus,  (u,v)  becomes  a  tree  edge.  If  the 
search  explores  (w,  v)  first  in  the  direction  from  v  to  u,  then  ( u ,  v)  is  a  back  edge, 
since  u  is  still  gray  at  the  time  the  edge  is  first  explored.  ■ 

We  shall  see  several  applications  of  these  theorems  in  the  following  sections. 

Exercises 


22.3-1 

Make  a  3-by-3  chart  with  row  and  column  labels  WHITE,  GRAY,  and  BLACK.  In 
each  cell  (i ,  j),  indicate  whether,  at  any  point  during  a  depth-first  search  of  a  di¬ 
rected  graph,  there  can  be  an  edge  from  a  vertex  of  color  i  to  a  vertex  of  color  j . 
For  each  possible  edge,  indicate  what  edge  types  it  can  be.  Make  a  second  such 
chart  for  depth-first  search  of  an  undirected  graph. 


22.3-2 

Show  how  depth-first  search  works  on  the  graph  of  Figure  22.6.  Assume  that  the 
for  loop  of  lines  5-7  of  the  DFS  procedure  considers  the  vertices  in  alphabetical 
order,  and  assume  that  each  adjacency  list  is  ordered  alphabetically.  Show  the 
discovery  and  finishing  times  for  each  vertex,  and  show  the  classification  of  each 
edge. 
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Figure  22.6  A  directed  graph  for  use  in  Exercises  22.3  2  and  22.5  2. 


22.3- 3 

Show  the  parenthesis  structure  of  the  depth-first  search  of  Figure  22.4. 

22.3- 4 

Show  that  using  a  single  bit  to  store  each  vertex  color  suffices  by  arguing  that 
the  DFS  procedure  would  produce  the  same  result  if  line  3  of  DFS -Visit  was 
removed. 

22.3- 5 

Show  that  edge  (u,  v)  is 

a.  a  tree  edge  or  forward  edge  if  and  only  if  u.d  <  v.d  <  v.f  <  u.f, 

b.  a  back  edge  if  and  only  if  v.d  <  u.d  <  u.f  <  v.f,  and 

c.  a  cross  edge  if  and  only  if  v.d  <  v.f  <  u.d  <  u.f. 

22.3- 6 

Show  that  in  an  undirected  graph,  classifying  an  edge  (u,  v)  as  a  tree  edge  or  a  back 
edge  according  to  whether  (u,  v)  or  (u,  u)  is  encountered  first  during  the  depth-first 
search  is  equivalent  to  classifying  it  according  to  the  ordering  of  the  four  types  in 
the  classification  scheme. 


22.3- 7 

Rewrite  the  procedure  DFS,  using  a  stack  to  eliminate  recursion. 

22.3- 8 

Give  a  counterexample  to  the  conjecture  that  if  a  directed  graph  G  contains  a  path 
from  u  to  v,  and  if  u.d  <  v.d  in  a  depth-first  search  of  G,  then  v  is  a  descendant 
of  u  in  the  depth-first  forest  produced. 
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22.3- 9 

Give  a  counterexample  to  the  conjecture  that  if  a  directed  graph  G  contains  a  path 
from  u  to  v,  then  any  depth-first  search  must  result  in  v.  d  <  u.f. 

22.3- 10 

Modify  the  pseudocode  for  depth-first  search  so  that  it  prints  out  every  edge  in  the 
directed  graph  G,  together  with  its  type.  Show  what  modifications,  if  any,  you  need 
to  make  if  G  is  undirected. 

22.3- 11 

Explain  how  a  vertex  u  of  a  directed  graph  can  end  up  in  a  depth-first  tree  contain¬ 
ing  only  u,  even  though  u  has  both  incoming  and  outgoing  edges  in  G. 

22.3- 12 

Show  that  we  can  use  a  depth-first  search  of  an  undirected  graph  G  to  identify  the 
connected  components  of  G,  and  that  the  depth-first  forest  contains  as  many  trees 
as  G  has  connected  components.  More  precisely,  show  how  to  modify  depth-first 
search  so  that  it  assigns  to  each  vertex  v  an  integer  label  v.cc  between  1  and  k, 
where  k  is  the  number  of  connected  components  of  G,  such  that  u.cc  =  v.cc  if 
and  only  if  u  and  v  are  in  the  same  connected  component. 

22.3- 13  * 

A  directed  graph  G  =  (V.  E)  is  singly  connected  if  u  v  implies  that  G  contains 
at  most  one  simple  path  from  u  to  v  for  all  vertices  u,v  €  V.  Give  an  efficient 
algorithm  to  determine  whether  or  not  a  directed  graph  is  singly  connected. 


22.4  Topological  sort 

This  section  shows  how  we  can  use  depth-first  search  to  perform  a  topological  sort 
of  a  directed  acyclic  graph,  or  a  “dag”  as  it  is  sometimes  called.  A  topological  sort 
of  a  dag  G  =  (V,  E)  is  a  linear  ordering  of  all  its  vertices  such  that  if  G  contains  an 
edge  (it,  v),  then  it  appears  before  v  in  the  ordering.  (If  the  graph  contains  a  cycle, 
then  no  linear  ordering  is  possible.)  We  can  view  a  topological  sort  of  a  graph  as 
an  ordering  of  its  vertices  along  a  horizontal  line  so  that  all  directed  edges  go  from 
left  to  right.  Topological  sorting  is  thus  different  from  the  usual  kind  of  “sorting” 
studied  in  Part  II. 

Many  applications  use  directed  acyclic  graphs  to  indicate  precedences  among 
events.  Figure  22.7  gives  an  example  that  arises  when  Professor  Bumstead  gets 
dressed  in  the  morning.  The  professor  must  don  certain  garments  before  others 
(e.g.,  socks  before  shoes).  Other  items  may  be  put  on  in  any  order  (e.g.,  socks  and 
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Figure  22.7  (a)  Professor  Bumstead  topologically  sorts  his  clothing  when  getting  dressed.  Each 

directed  edge  ( u ,  u)  means  that  garment  u  must  be  put  on  before  garment  v.  The  discovery  and 
finishing  times  from  a  depth  first  search  are  shown  next  to  each  vertex,  (b)  The  same  graph  shown 
topologically  sorted,  with  its  vertices  arranged  from  left  to  right  in  order  of  decreasing  finishing  time. 
All  directed  edges  go  from  left  to  right. 

pants).  A  directed  edge  (w,  v)  in  the  dag  of  Figure  22.7(a)  indicates  that  garment  u 
must  be  donned  before  garment  v.  A  topological  sort  of  this  dag  therefore  gives  an 
order  for  getting  dressed.  Figure  22.7(b)  shows  the  topologically  sorted  dag  as  an 
ordering  of  vertices  along  a  horizontal  line  such  that  all  directed  edges  go  from  left 
to  right. 

The  following  simple  algorithm  topologically  sorts  a  dag: 
Topological-Sort(G) 

1  call  DFS(G)  to  compute  finishing  times  v.f  for  each  vertex  v 

2  as  each  vertex  is  finished,  insert  it  onto  the  front  of  a  linked  list 

3  return  the  linked  list  of  vertices 

Figure  22.7(b)  shows  how  the  topologically  sorted  vertices  appear  in  reverse  order 
of  their  finishing  times. 

We  can  perform  a  topological  sort  in  time  0(F  +  E),  since  depth-first  search 
takes  0(F  +  E)  time  and  it  takes  0(1)  time  to  insert  each  of  the  |  V\  vertices  onto 
the  front  of  the  linked  list. 

We  prove  the  correctness  of  this  algorithm  using  the  following  key  lemma  char¬ 
acterizing  directed  acyclic  graphs. 
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Lemma  22.11 

A  directed  graph  G  is  acyclic  if  and  only  if  a  depth-first  search  of  G  yields  no  back 
edges. 

Proof  Suppose  that  a  depth-first  search  produces  a  back  edge  (u,  v).  Then 
vertex  v  is  an  ancestor  of  vertex  u  in  the  depth-first  forest.  Thus,  G  contains  a  path 
from  v  to  u,  and  the  back  edge  (u,v)  completes  a  cycle. 

-<=:  Suppose  that  G  contains  a  cycle  c.  We  show  that  a  depth-first  search  of  G 
yields  a  back  edge.  Let  v  be  the  first  vertex  to  be  discovered  in  c,  and  let  (u,  v)  be 
the  preceding  edge  in  c.  At  time  v.d,  the  vertices  of  c  form  a  path  of  white  vertices 
from  v  to  u.  By  the  white-path  theorem,  vertex  u  becomes  a  descendant  of  v  in  the 
depth-first  forest.  Therefore,  (u,  v)  is  a  back  edge.  ■ 

Theorem  22.12 

Topological-Sort  produces  a  topological  sort  of  the  directed  acyclic  graph 
provided  as  its  input. 

Proof  Suppose  that  DFS  is  run  on  a  given  dag  G  =  (V,  E )  to  determine  fin¬ 
ishing  times  for  its  vertices.  It  suffices  to  show  that  for  any  pair  of  distinct  ver¬ 
tices  u,  v  e  V,  if  G  contains  an  edge  from  u  to  v,  then  v.f  <  u.f.  Consider  any 
edge  ( u ,  v)  explored  by  DFS(G).  When  this  edge  is  explored,  v  cannot  be  gray, 
since  then  v  would  be  an  ancestor  of  u  and  (w,  v)  would  be  a  back  edge,  contra¬ 
dicting  Lemma  22.11.  Therefore,  v  must  be  either  white  or  black.  If  v  is  white, 
it  becomes  a  descendant  of  u,  and  so  v.f  <  u.f.  If  v  is  black,  it  has  already  been 
finished,  so  that  v.f  has  already  been  set.  Because  we  are  still  exploring  from  u,  we 
have  yet  to  assign  a  timestamp  to  u.f,  and  so  once  we  do,  we  will  have  v.f  <  u.f 
as  well.  Thus,  for  any  edge  (u,  v)  in  the  dag,  we  have  v.f  <  u.f,  proving  the 
theorem.  ■ 

Exercises 


22.4-1 

Show  the  ordering  of  vertices  produced  by  TOPOLOGICAL-SORT  when  it  is  run  on 
the  dag  of  Figure  22.8,  under  the  assumption  of  Exercise  22.3-2. 


22.4-2 

Give  a  linear-time  algorithm  that  takes  as  input  a  directed  acyclic  graph  G  = 
(V,  E)  and  two  vertices  s  and  t,  and  returns  the  number  of  simple  paths  from  s 
to  t  in  G.  For  example,  the  directed  acyclic  graph  of  Figure  22.8  contains  exactly 
four  simple  paths  from  vertex  p  to  vertex  v\  pov,  poryv,  posryv,  and  psryv. 
(Your  algorithm  needs  only  to  count  the  simple  paths,  not  list  them.) 
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Figure  22.8  A  dag  for  topological  sorting. 


22.4- 3 

Give  an  algorithm  that  determines  whether  or  not  a  given  undirected  graph  G  = 
( V ,  E)  contains  a  cycle.  Your  algorithm  should  run  in  O(V)  time,  independent 

of  |£|- 

22.4- 4 

Prove  or  disprove:  If  a  directed  graph  G  contains  cycles,  then  TOPOLOGICAL- 
Sort(G)  produces  a  vertex  ordering  that  minimizes  the  number  of  “bad”  edges 
that  are  inconsistent  with  the  ordering  produced. 


22.4-5 

Another  way  to  perform  topological  sorting  on  a  directed  acyclic  graph  G  = 
( V ,  E)  is  to  repeatedly  find  a  vertex  of  in-degree  0,  output  it,  and  remove  it  and 
all  of  its  outgoing  edges  from  the  graph.  Explain  how  to  implement  this  idea  so 
that  it  runs  in  time  0(V  +  E).  What  happens  to  this  algorithm  if  G  has  cycles? 


22.5  Strongly  connected  components 

We  now  consider  a  classic  application  of  depth-first  search:  decomposing  a  di¬ 
rected  graph  into  its  strongly  connected  components.  This  section  shows  how  to  do 
so  using  two  depth-first  searches.  Many  algorithms  that  work  with  directed  graphs 
begin  with  such  a  decomposition.  After  decomposing  the  graph  into  strongly  con¬ 
nected  components,  such  algorithms  run  separately  on  each  one  and  then  combine 
the  solutions  according  to  the  structure  of  connections  among  components. 

Recall  from  Appendix  B  that  a  strongly  connected  component  of  a  directed 
graph  G  —  (V,  E)  is  a  maximal  set  of  vertices  C  C  V  such  that  for  every  pair 
of  vertices  u  and  v  in  C,  we  have  both  u  v  and  v  w;  that  is,  vertices  u  and  v 
are  reachable  from  each  other.  Figure  22.9  shows  an  example. 
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(a) 


(b) 


Figure  22.9  (a)  A  directed  graph  G.  Each  shaded  region  is  a  strongly  connected  component  of  G. 
Each  vertex  is  labeled  with  its  discovery  and  finishing  times  in  a  depth  first  search,  and  tree  edges 
are  shaded,  (b)  The  graph  GT,  the  transpose  of  G,  with  the  depth  first  forest  computed  in  line  3 
of  Strongly  Connected  Components  shown  and  tree  edges  shaded.  Each  strongly  connected 
component  corresponds  to  one  depth  first  tree.  Vertices  b,  c,  g,  and  h,  which  are  heavily  shaded,  are 
the  roots  of  the  depth  first  trees  produced  by  the  depth  first  search  of  GT.  (c)  The  acyclic  component 
graph  Gscc  obtained  by  contracting  all  edges  within  each  strongly  connected  component  of  G  so 
that  only  a  single  vertex  remains  in  each  component. 


Our  algorithm  for  finding  strongly  connected  components  of  a  graph  G  = 
{V,  E)  uses  the  transpose  of  G,  which  we  defined  in  Exercise  22.1-3  to  be  the 
graph  GT  =  (V,  £T),  where  ET  =  {(n,  v) :  (v,  u)  e  E).  That  is,  ET  consists  of 
the  edges  of  G  with  their  directions  reversed.  Given  an  adjacency-list  representa¬ 
tion  of  G ,  the  time  to  create  GT  is  0(  V  +  E).  It  is  interesting  to  observe  that  G 
and  G1  have  exactly  the  same  strongly  connected  components:  u  and  v  are  reach¬ 
able  from  each  other  in  G  if  and  only  if  they  are  reachable  from  each  other  in  GT. 
Figure  22.9(b)  shows  the  transpose  of  the  graph  in  Figure  22.9(a),  with  the  strongly 
connected  components  shaded. 
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The  following  linear-time  (i.e.,  Q(F+  L  )-timc)  algorithm  computes  the  strongly 
connected  components  of  a  directed  graph  G  =  (V,  E)  using  two  depth-first 
searches,  one  on  G  and  one  on  GT. 

Strongly-Connected-Components(G) 

1  call  DFS(G)  to  compute  finishing  times  u.f  for  each  vertex  u 

2  compute  GT 

3  call  DFS(GT),  but  in  the  main  loop  of  DFS,  consider  the  vertices 

in  order  of  decreasing  u ./  (as  computed  in  line  1 ) 

4  output  the  vertices  of  each  tree  in  the  depth-first  forest  formed  in  line  3  as  a 

separate  strongly  connected  component 

The  idea  behind  this  algorithm  comes  from  a  key  property  of  the  component 
graph  Gscc  =  (Fscc.  ES((),  which  we  define  as  follows.  Suppose  that  G 
has  strongly  connected  components  C\,  C2, . . . ,  Ck-  The  vertex  set  Fscc  is 
{ v i ,  v2, . . . ,  Vjt},  and  it  contains  a  vertex  v,  for  each  strongly  connected  compo¬ 
nent  Cj  of  G.  There  is  an  edge  (v,-,  Vj)  €  Escc  if  G  contains  a  directed  edge  (x,y) 
for  some  x  e  C,  and  some  y  e  Cj.  Looked  at  another  way,  by  contracting  all 
edges  whose  incident  vertices  are  within  the  same  strongly  connected  component 
of  G,  the  resulting  graph  is  Gscc.  Figure  22.9(c)  shows  the  component  graph  of 
the  graph  in  Figure  22.9(a). 

The  key  property  is  that  the  component  graph  is  a  dag,  which  the  following 
lemma  implies. 

Lemma  22.13 

Let  C  and  C'  be  distinct  strongly  connected  components  in  directed  graph  G  = 
(V,  E),  let  u.v  €  C,  let  u' ,  v'  e  C',  and  suppose  that  G  contains  a  path  u  u' . 
Then  G  cannot  also  contain  a  path  v'  ^  v. 

Proof  If  G  contains  a  path  v'  v,  then  it  contains  paths  u  ^  u'  ^  v'  and 
v'  ^  v  ^  u.  Thus,  u  and  v'  are  reachable  from  each  other,  thereby  contradicting 
the  assumption  that  C  and  C  are  distinct  strongly  connected  components.  ■ 

We  shall  see  that  by  considering  vertices  in  the  second  depth-first  search  in  de¬ 
creasing  order  of  the  finishing  times  that  were  computed  in  the  first  depth-first 
search,  we  are,  in  essence,  visiting  the  vertices  of  the  component  graph  (each  of 
which  corresponds  to  a  strongly  connected  component  of  G)  in  topologically  sorted 
order. 

Because  the  Strongly-Connected-Components  procedure  performs  two 
depth-first  searches,  there  is  the  potential  for  ambiguity  when  we  discuss  u.d 
or  u.f.  In  this  section,  these  values  always  refer  to  the  discovery  and  finishing 
times  as  computed  by  the  first  call  of  DFS,  in  line  1. 
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We  extend  the  notation  for  discovery  and  finishing  times  to  sets  of  vertices. 
If  U  C  V,  then  we  define  d(U)  =  xmnusU  {u.d}  and  f(U)  =  maxuS{/  {u.f}. 
That  is,  d(U)  and  f(JJ)  are  the  earliest  discovery  time  and  latest  finishing  time, 
respectively,  of  any  vertex  in  U. 

The  following  lemma  and  its  corollary  give  a  key  property  relating  strongly  con¬ 
nected  components  and  finishing  times  in  the  first  depth-first  search. 

Lemma  22.14 

Let  C  and  C'  be  distinct  strongly  connected  components  in  directed  graph  G  = 
(V,  E).  Suppose  that  there  is  an  edge  ( u ,  v)  e  E,  where  u  e  C  and  v  e  C' .  Then 
/(C)  >  f{C'). 

Proof  We  consider  two  cases,  depending  on  which  strongly  connected  compo¬ 
nent,  C  or  C' ,  had  the  first  discovered  vertex  during  the  depth-first  search. 

If  d(C)  <  d(C'),  let  x  be  the  first  vertex  discovered  in  C.  At  time  x.d,  all  ver¬ 
tices  in  C  and  C'  are  white.  At  that  time,  G  contains  a  path  from  x  to  each  vertex 
in  C  consisting  only  of  white  vertices.  Because  (u.  v)  e  E,  for  any  vertex  w  €  C  ', 
there  is  also  a  path  in  G  at  time  x.d  from  x  to  w  consisting  only  of  white  vertices: 
x  ^  u  v  ^  w.  By  the  white-path  theorem,  all  vertices  in  C  and  C'  become 
descendants  of  x  in  the  depth-first  tree.  By  Corollary  22.8,  x  has  the  latest  finishing 
time  of  any  of  its  descendants,  and  so  x.f  =  /(C)  >  /(CO- 

If  instead  we  have  d(C)  >  d(C'),  let  y  be  the  first  vertex  discovered  in  C' . 
At  time  y.d,  all  vertices  in  C'  are  white  and  G  contains  a  path  from  y  to  each 
vertex  in  C'  consisting  only  of  white  vertices.  By  the  white -path  theorem,  all  ver¬ 
tices  in  C'  become  descendants  of  y  in  the  depth-first  tree,  and  by  Corollary  22.8, 
y.f  =  / ( C ')•  At  time  y.d,  all  vertices  in  C  are  white.  Since  there  is  an  edge  ( u ,  v) 
from  C  to  C',  Lemma  22.13  implies  that  there  cannot  be  a  path  from  C'  to  C. 
Hence,  no  vertex  in  C  is  reachable  from  y.  At  time  y.f,  therefore,  all  vertices  in  C 
are  still  white.  Thus,  for  any  vertex  w  €  C ,  we  have  w.f  >  y.f,  which  implies 
that  /(C)  >  /(CO-  ■ 

The  following  corollary  tells  us  that  each  edge  in  GT  that  goes  between  different 
strongly  connected  components  goes  from  a  component  with  an  earlier  finishing 
time  (in  the  first  depth-first  search)  to  a  component  with  a  later  finishing  time. 

Corollary  22.15 

Let  C  and  C'  be  distinct  strongly  connected  components  in  directed  graph  G  = 
(V,  E).  Suppose  that  there  is  an  edge  ( u ,  v)  e  ET,  where  u  €  C  and  v  €  C’ .  Then 

/(C)  <  /(co. 
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Proof  Since  (u,v)  €  ET,  we  have  (v,w)  e  E.  Because  the  strongly  con¬ 
nected  components  of  G  and  GT  are  the  same,  Lemma  22.14  implies  that 
/(C)  <  /(CO.  ■ 

Corollary  22.15  provides  the  key  to  understanding  why  the  strongly  connected 
components  algorithm  works.  Let  us  examine  what  happens  when  we  perform  the 
second  depth-first  search,  which  is  on  GT.  We  start  with  the  strongly  connected 
component  C  whose  finishing  time  /(C)  is  maximum.  The  search  starts  from 
some  vertex  x  €  C,  and  it  visits  all  vertices  in  C.  By  Corollary  22.15,  GT  contains 
no  edges  from  C  to  any  other  strongly  connected  component,  and  so  the  search 
from  x  will  not  visit  vertices  in  any  other  component.  Thus,  the  tree  rooted  at  x 
contains  exactly  the  vertices  of  C.  Having  completed  visiting  all  vertices  in  C, 
the  search  in  line  3  selects  as  a  root  a  vertex  from  some  other  strongly  connected 
component  C'  whose  finishing  time  / (C)  is  maximum  over  all  components  other 
than  C.  Again,  the  search  will  visit  all  vertices  in  C' ’,  but  by  Corollary  22.15, 
the  only  edges  in  GT  from  C'  to  any  other  component  must  be  to  C,  which  we 
have  already  visited.  In  general,  when  the  depth-first  search  of  GT  in  line  3  visits 
any  strongly  connected  component,  any  edges  out  of  that  component  must  be  to 
components  that  the  search  already  visited.  Each  depth-first  tree,  therefore,  will  be 
exactly  one  strongly  connected  component.  The  following  theorem  formalizes  this 
argument. 

Theorem  22.16 

The  Strongly-Connected-Components  procedure  correctly  computes  the 
strongly  connected  components  of  the  directed  graph  G  provided  as  its  input. 

Proof  We  argue  by  induction  on  the  number  of  depth-first  trees  found  in  the 
depth-first  search  of  GT  in  line  3  that  the  vertices  of  each  tree  form  a  strongly 
connected  component.  The  inductive  hypothesis  is  that  the  first  k  trees  produced 
in  line  3  are  strongly  connected  components.  The  basis  for  the  induction,  when 
k  =  0,  is  trivial. 

In  the  inductive  step,  we  assume  that  each  of  the  first  k  depth-first  trees  produced 
in  line  3  is  a  strongly  connected  component,  and  we  consider  the  ( k  +  l)st  tree 
produced.  Let  the  root  of  this  tree  be  vertex  u,  and  let  u  be  in  strongly  connected 
component  C .  Because  of  how  we  choose  roots  in  the  depth-first  search  in  line  3, 
u.f  =  /(C)  >  f(C')  for  any  strongly  connected  component  C  other  than  C 
that  has  yet  to  be  visited.  By  the  inductive  hypothesis,  at  the  time  that  the  search 
visits  u,  all  other  vertices  of  C  are  white.  By  the  white-path  theorem,  therefore,  all 
other  vertices  of  C  are  descendants  of  u  in  its  depth-first  tree.  Moreover,  by  the 
inductive  hypothesis  and  by  Corollary  22.15,  any  edges  in  GT  that  leave  C  must  be 
to  strongly  connected  components  that  have  already  been  visited.  Thus,  no  vertex 
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in  any  strongly  connected  component  other  than  C  will  be  a  descendant  of  u  during 
the  depth-first  search  of  GT.  Thus,  the  vertices  of  the  depth-first  tree  in  GT  that  is 
rooted  at  u  form  exactly  one  strongly  connected  component,  which  completes  the 
inductive  step  and  the  proof.  ■ 

Here  is  another  way  to  look  at  how  the  second  depth-first  search  operates.  Con¬ 
sider  the  component  graph  (GT)SCC  of  GT.  If  we  map  each  strongly  connected 
component  visited  in  the  second  depth-first  search  to  a  vertex  of  (GT)SCC,  the  sec¬ 
ond  depth-first  search  visits  vertices  of  (GT)SCC  in  the  reverse  of  a  topologically 
sorted  order.  If  we  reverse  the  edges  of  (GT)SCC,  we  get  the  graph  ((GT)SCC)T. 
Because  ((GT)SCC)T  =  Gscc  (see  Exercise  22.5-4),  the  second  depth-first  search 
visits  the  vertices  of  Gscc  in  topologically  sorted  order. 

Exercises 


22.5-1 

How  can  the  number  of  strongly  connected  components  of  a  graph  change  if  a  new 
edge  is  added? 


22.5-2 

Show  how  the  procedure  Strongly-Connected-Components  works  on  the 
graph  of  Figure  22.6.  Specifically,  show  the  finishing  times  computed  in  line  1  and 
the  forest  produced  in  line  3.  Assume  that  the  loop  of  lines  5-7  of  DFS  considers 
vertices  in  alphabetical  order  and  that  the  adjacency  lists  are  in  alphabetical  order. 


22.5- 3 

Professor  Bacon  claims  that  the  algorithm  for  strongly  connected  components 
would  be  simpler  if  it  used  the  original  (instead  of  the  transpose)  graph  in  the 
second  depth-first  search  and  scanned  the  vertices  in  order  of  increasing  finishing 
times.  Does  this  simpler  algorithm  always  produce  correct  results? 

22.5- 4 

Prove  that  for  any  directed  graph  G,  we  have  ((GT)SCC)T  =  Gscc.  That  is,  the 
transpose  of  the  component  graph  of  GT  is  the  same  as  the  component  graph  of  G. 


22.5-5 

Give  an  0(V  +  is) -time  algorithm  to  compute  the  component  graph  of  a  directed 
graph  G  =  (V,  E).  Make  sure  that  there  is  at  most  one  edge  between  two  vertices 
in  the  component  graph  your  algorithm  produces. 
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22.5-6 

Given  a  directed  graph  G  =  (V.  E),  explain  how  to  create  another  graph  G'  = 
(V,  E')  such  that  (a)  G'  has  the  same  strongly  connected  components  as  G,  (b)  G' 
has  the  same  component  graph  as  G,  and  (c)  E'  is  as  small  as  possible.  Describe  a 
fast  algorithm  to  compute  G' . 


22.5-7 

A  directed  graph  G  =  (V.  E)  is  semiconnected  if,  for  all  pairs  of  vertices  u,v  €  V, 
we  have  u  v  or  v  ^  u.  Give  an  efficient  algorithm  to  determine  whether 
or  not  G  is  semiconnected.  Prove  that  your  algorithm  is  correct,  and  analyze  its 
running  time. 


Problems 


22-1  Classifying  edges  by  breadth-first  search 

A  depth-first  forest  classifies  the  edges  of  a  graph  into  tree,  back,  forward,  and 
cross  edges.  A  breadth-first  tree  can  also  be  used  to  classify  the  edges  reachable 
from  the  source  of  the  search  into  the  same  four  categories. 

a.  Prove  that  in  a  breadth-first  search  of  an  undirected  graph,  the  following  prop¬ 
erties  hold: 

1.  There  are  no  back  edges  and  no  forward  edges. 

2.  For  each  tree  edge  ( u ,  v),  we  have  v.d  =  u.d  +  1. 

3.  For  each  cross  edge  ( u ,  v),  we  have  v.d  =  u.d  or  v.d  =  u.d  +  1. 

b.  Prove  that  in  a  breadth-first  search  of  a  directed  graph,  the  following  properties 
hold: 

1.  There  are  no  forward  edges. 

2.  For  each  tree  edge  ( u ,  v),  we  have  v.d  =  u.d  +  1. 

3.  For  each  cross  edge  (u.  v),  we  have  v.d  <  u.d  +  1. 

4.  For  each  back  edge  (u.  v),  we  have  0  <  v.d  <  u.d. 

22-2  Articulation  points,  bridges,  and  biconnected  components 
Let  G  =  (V.  E)  be  a  connected,  undirected  graph.  An  articulation  point  of  G  is 
a  vertex  whose  removal  disconnects  G.  A  bridge  of  G  is  an  edge  whose  removal 
disconnects  G.  A  biconnected  component  of  G  is  a  maximal  set  of  edges  such 
that  any  two  edges  in  the  set  lie  on  a  common  simple  cycle.  Figure  22.10  illustrates 
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Figure  22.10  The  articulation  points,  bridges,  and  biconnected  components  of  a  connected,  undi 
rected  graph  for  use  in  Problem  22  2.  The  articulation  points  are  the  heavily  shaded  vertices,  the 
bridges  are  the  heavily  shaded  edges,  and  the  biconnected  components  are  the  edges  in  the  shaded 
regions,  with  a  bcc  numbering  shown. 

these  definitions.  We  can  determine  articulation  points,  bridges,  and  biconnected 
components  using  depth-first  search.  Let  G„  =  (V,  E„)  be  a  depth-first  tree  of  G. 

a.  Prove  that  the  root  of  Gn  is  an  articulation  point  of  G  if  and  only  if  it  has  at 
least  two  children  in  G„. 

b.  Let  v  be  a  nonroot  vertex  of  G„.  Prove  that  v  is  an  articulation  point  of  G  if  and 
only  if  v  has  a  child  s  such  that  there  is  no  back  edge  from  s  or  any  descendant 
of  s  to  a  proper  ancestor  of  v. 

c.  Let 

v. d , 

w. d  :  (m,  to)  is  a  back  edge  for  some  descendant  u  of  v  . 

Show  how  to  compute  v.  low  for  all  vertices  ve  Fin  O(E)  time. 

d.  Show  how  to  compute  all  articulation  points  in  O(E)  time. 

e.  Prove  that  an  edge  of  G  is  a  bridge  if  and  only  if  it  does  not  lie  on  any  simple 
cycle  of  G. 

f  Show  how  to  compute  all  the  bridges  of  G  in  O(E)  time. 

g.  Prove  that  the  biconnected  components  of  G  partition  the  nonbridge  edges  of  G. 

h.  Give  an  0(£)-time  algorithm  to  label  each  edge  e  of  G  with  a  positive  in¬ 
teger  e.bcc  such  that  e.bcc  =  e' .bcc  if  and  only  if  e  and  e'  are  in  the  same 
biconnected  component. 


v.low  =  min 
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22-3  Euler  tour 

An  Euler  tour  of  a  strongly  connected,  directed  graph  G  =  (KG)  is  a  cycle  that 
traverses  each  edge  of  G  exactly  once,  although  it  may  visit  a  vertex  more  than 
once. 

a.  Show  that  G  has  an  Euler  tour  if  and  only  if  in-degree (v)  =  out-degree (v)  for 
each  vertex  v  e  V. 

b.  Describe  an  0(E)- time  algorithm  to  find  an  Euler  tour  of  G  if  one  exists.  (Hint: 
Merge  edge-disjoint  cycles.) 

22-4  Reachability 

Let  G  =  (V,  E)  be  a  directed  graph  in  which  each  vertex  u  €  V  is  labeled  with 
a  unique  integer  L(u )  from  the  set  {1,2, ... ,  |  V\).  For  each  vertex  u  e  V,  let 
R(u)  =  {pef  :  u  v}  be  the  set  of  vertices  that  are  reachable  from  u.  Define 
min(w)  to  be  the  vertex  in  R(u)  whose  label  is  minimum,  i.e.,  min(w)  is  the  vertex  v 
such  that  L(v )  =  min  {L(w)  :  w  €  R(u)}.  Give  an  0(V  +  fi)-time  algorithm  that 
computes  min(n)  for  all  vertices  u  €  V. 


Chapter  notes 

Even  [103]  and  Tarjan  [330]  are  excellent  references  for  graph  algorithms. 

Breadth-first  search  was  discovered  by  Moore  [260]  in  the  context  of  finding 
paths  through  mazes.  Lee  [226]  independently  discovered  the  same  algorithm  in 
the  context  of  routing  wires  on  circuit  boards. 

Hopcroft  and  Tarjan  [178]  advocated  the  use  of  the  adjacency-list  representation 
over  the  adjacency-matrix  representation  for  sparse  graphs  and  were  the  first  to 
recognize  the  algorithmic  importance  of  depth-first  search.  Depth-first  search  has 
been  widely  used  since  the  late  1950s,  especially  in  artificial  intelligence  programs. 

Tarjan  [327]  gave  a  linear-time  algorithm  for  finding  strongly  connected  compo¬ 
nents.  The  algorithm  for  strongly  connected  components  in  Section  22.5  is  adapted 
from  Aho,  Hopcroft,  and  Ullman  [6],  who  credit  it  to  S.  R.  Kosaraju  (unpublished) 
and  M.  Sharir  [314].  Gabow  [119]  also  developed  an  algorithm  for  strongly  con¬ 
nected  components  that  is  based  on  contracting  cycles  and  uses  two  stacks  to  make 
it  run  in  linear  time.  Knuth  [209]  was  the  first  to  give  a  linear-time  algorithm  for 
topological  sorting. 
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Electronic  circuit  designs  often  need  to  make  the  pins  of  several  components  elec¬ 
trically  equivalent  by  wiring  them  together.  To  interconnect  a  set  of  n  pins,  we  can 
use  an  arrangement  of  n  —  1  wires,  each  connecting  two  pins.  Of  all  such  arrange¬ 
ments,  the  one  that  uses  the  least  amount  of  wire  is  usually  the  most  desirable. 

We  can  model  this  wiring  problem  with  a  connected,  undirected  graph  G  = 
(V,  E),  where  V  is  the  set  of  pins,  E  is  the  set  of  possible  interconnections  between 
pairs  of  pins,  and  for  each  edge  (u,v)  €  E,  we  have  a  weight  w(u,  v)  specifying 
the  cost  (amount  of  wire  needed)  to  connect  u  and  v.  We  then  wish  to  find  an 
acyclic  subset  T  C  E  that  connects  all  of  the  vertices  and  whose  total  weight 

w(T)  =  ^  w(u,v) 

(u,v)&T 

is  minimized.  Since  T  is  acyclic  and  connects  all  of  the  vertices,  it  must  form  a  tree, 
which  we  call  a  spanning  tree  since  it  “spans”  the  graph  G.  We  call  the  problem  of 
determining  the  tree  T  the  minimum-spanning-tree  problem.1  Figure  23.1  shows 
an  example  of  a  connected  graph  and  a  minimum  spanning  tree. 

In  this  chapter,  we  shall  examine  two  algorithms  for  solving  the  minimum- 
spanning-tree  problem:  Kruskal’s  algorithm  and  Prim’s  algorithm.  We  can  easily 
make  each  of  them  run  in  time  0(E  lg  V)  using  ordinary  binary  heaps.  By  using 
Fibonacci  heaps,  Prim’s  algorithm  runs  in  time  0(E  +  V  lg  V),  which  improves 
over  the  binary-heap  implementation  if  |  V\  is  much  smaller  than  \E\. 

The  two  algorithms  are  greedy  algorithms,  as  described  in  Chapter  16.  Each 
step  of  a  greedy  algorithm  must  make  one  of  several  possible  choices.  The  greedy 
strategy  advocates  making  the  choice  that  is  the  best  at  the  moment.  Such  a  strat¬ 
egy  does  not  generally  guarantee  that  it  will  always  find  globally  optimal  solutions 


lrThe  phrase  “minimum  spanning  tree”  is  a  shortened  form  of  the  phrase  “minimum  weight  spanning 
tree.”  We  are  not,  for  example,  minimizing  the  number  of  edges  in  T,  since  all  spanning  trees  have 
exactly  \  V\  —  1  edges  by  Theorem  B.2. 
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Figure  23.1  A  minimum  spanning  tree  for  a  connected  graph.  The  weights  on  edges  are  shown, 
and  the  edges  in  a  minimum  spanning  tree  are  shaded.  The  total  weight  of  the  tree  shown  is  37.  This 
minimum  spanning  tree  is  not  unique:  removing  the  edge  (b.c)  and  replacing  it  with  the  edge  (a.  h) 
yields  another  spanning  tree  with  weight  37. 

to  problems.  For  the  minimum-spanning-tree  problem,  however,  we  can  prove  that 
certain  greedy  strategies  do  yield  a  spanning  tree  with  minimum  weight.  Although 
you  can  read  this  chapter  independently  of  Chapter  16,  the  greedy  methods  pre¬ 
sented  here  are  a  classic  application  of  the  theoretical  notions  introduced  there. 

Section  23. 1  introduces  a  “generic”  minimum-spanning-tree  method  that  grows 
a  spanning  tree  by  adding  one  edge  at  a  time.  Section  23.2  gives  two  algorithms 
that  implement  the  generic  method.  The  first  algorithm,  due  to  Kruskal,  is  similar 
to  the  connected-components  algorithm  from  Section  21.1.  The  second,  due  to 
Prim,  resembles  Dijkstra’s  shortest-paths  algorithm  (Section  24.3). 

Because  a  tree  is  a  type  of  graph,  in  order  to  be  precise  we  must  define  a  tree  in 
terms  of  not  just  its  edges,  but  its  vertices  as  well.  Although  this  chapter  focuses 
on  trees  in  terms  of  their  edges,  we  shall  operate  with  the  understanding  that  the 
vertices  of  a  tree  T  are  those  that  some  edge  of  T  is  incident  on. 


23.1  Growing  a  minimum  spanning  tree 

Assume  that  we  have  a  connected,  undirected  graph  G  =  (V,  E)  with  a  weight 
function  w  :  E  -*  M,  and  we  wish  to  find  a  minimum  spanning  tree  for  G.  The 
two  algorithms  we  consider  in  this  chapter  use  a  greedy  approach  to  the  problem, 
although  they  differ  in  how  they  apply  this  approach. 

This  greedy  strategy  is  captured  by  the  following  generic  method,  which  grows 
the  minimum  spanning  tree  one  edge  at  a  time.  The  generic  method  manages  a  set 
of  edges  A,  maintaining  the  following  loop  invariant: 

Prior  to  each  iteration,  A  is  a  subset  of  some  minimum  spanning  tree. 

At  each  step,  we  determine  an  edge  (u,  v)  that  we  can  add  to  A  without  violating 
this  invariant,  in  the  sense  that  A  U  {(m,  u)}  is  also  a  subset  of  a  minimum  spanning 
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tree.  We  call  such  an  edge  a  safe  edge  for  A,  since  we  can  add  it  safely  to  A  while 
maintaining  the  invariant. 

Generic-MST(G,  w) 

1  A  =  0 

2  while  A  does  not  form  a  spanning  tree 

3  find  an  edge  (u ,  v )  that  is  safe  for  A 

4  A  =  A  U  {(u,  v)} 

5  return  A 

We  use  the  loop  invariant  as  follows: 

Initialization:  After  line  1,  the  set  A  trivially  satisfies  the  loop  invariant. 

Maintenance:  The  loop  in  lines  2-4  maintains  the  invariant  by  adding  only  safe 
edges. 

Termination:  All  edges  added  to  A  are  in  a  minimum  spanning  tree,  and  so  the 
set  A  returned  in  line  5  must  be  a  minimum  spanning  tree. 

The  tricky  part  is,  of  course,  finding  a  safe  edge  in  line  3.  One  must  exist,  since 
when  line  3  is  executed,  the  invariant  dictates  that  there  is  a  spanning  tree  T  such 
that  ACT.  Within  the  while  loop  body,  A  must  be  a  proper  subset  of  T,  and 
therefore  there  must  be  an  edge  (it,  v)  e  T  such  that  (it,  v)  f  A  and  (it,  v)  is  safe 
for  A. 

In  the  remainder  of  this  section,  we  provide  a  rule  (Theorem  23.1)  for  recogniz¬ 
ing  safe  edges.  The  next  section  describes  two  algorithms  that  use  this  rule  to  find 
safe  edges  efficiently. 

We  first  need  some  definitions.  A  cut  (S.  V  —  S )  of  an  undirected  graph  G  — 
(V,  E)  is  a  partition  of  V.  Figure  23.2  illustrates  this  notion.  We  say  that  an  edge 
(u,  v)  €  E  crosses  the  cut  ( S ,  V  —  S)  if  one  of  its  endpoints  is  in  S  and  the  other 
is  in  V  —  S.  We  say  that  a  cut  respects  a  set  A  of  edges  if  no  edge  in  A  crosses  the 
cut.  An  edge  is  a  light  edge  crossing  a  cut  if  its  weight  is  the  minimum  of  any  edge 
crossing  the  cut.  Note  that  there  can  be  more  than  one  light  edge  crossing  a  cut  in 
the  case  of  ties.  More  generally,  we  say  that  an  edge  is  a  light  edge  satisfying  a 
given  property  if  its  weight  is  the  minimum  of  any  edge  satisfying  the  property. 
Our  rule  for  recognizing  safe  edges  is  given  by  the  following  theorem. 

Theorem  23.1 

Let  G  =  (V,  E)  be  a  connected,  undirected  graph  with  a  real- valued  weight  func¬ 
tion  w  defined  on  E.  Let  A  be  a  subset  of  E  that  is  included  in  some  minimum 
spanning  tree  for  G,  let  ( S ,  V  —  S)  he  any  cut  of  G  that  respects  A,  and  let  (it,  v) 
be  a  light  edge  crossing  ( S ,  V  —  S).  Then,  edge  ( u ,  v )  is  safe  for  A. 
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Figure  23.2  Two  ways  of  viewing  a  cut  (S,  V  —  S)  of  the  graph  from  Figure  23.1.  (a)  Black 
vertices  are  in  the  set  S,  and  white  vertices  are  in  V  —  S.  The  edges  crossing  the  cut  are  those 
connecting  white  vertices  with  black  vertices.  The  edge  (d ,  c)  is  the  unique  light  edge  crossing  the 
cut.  A  subset  A  of  the  edges  is  shaded;  note  that  the  cut  (S,  V  —  S)  respects  A,  since  no  edge  of  A 
crosses  the  cut.  (b)  The  same  graph  with  the  vertices  in  the  set  S  on  the  left  and  the  vertices  in  the 
set  V  —  S  on  the  right.  An  edge  crosses  the  cut  if  it  connects  a  vertex  on  the  left  with  a  vertex  on  the 
right. 


Proof  Let  I  be  a  minimum  spanning  tree  that  includes  A,  and  assume  that  T 
does  not  contain  the  light  edge  ( u ,  u),  since  if  it  does,  we  are  done.  We  shall 
construct  another  minimum  spanning  tree  T'  that  includes  A  U  {(w,  u)}  by  using  a 
cut-and-paste  technique,  thereby  showing  that  (u.  v)  is  a  safe  edge  for  A. 

The  edge  (w,  v)  forms  a  cycle  with  the  edges  on  the  simple  path  p  from  u 
to  v  in  T,  as  Figure  23.3  illustrates.  Since  u  and  v  are  on  opposite  sides  of  the 
cut  (S,  V  —  S),  at  least  one  edge  in  T  lies  on  the  simple  path  p  and  also  crosses 
the  cut.  Let  ( x ,  y)  be  any  such  edge.  The  edge  (x,  y )  is  not  in  A,  because  the  cut 
respects  A.  Since  (*,  y)  is  on  the  unique  simple  path  from  u  to  v  in  T,  remov¬ 
ing  (x,  y)  breaks  T  into  two  components.  Adding  (w,  u)  reconnects  them  to  form 
a  new  spanning  tree  T'  =  T  —  {(x,  y)}  U  {(w,  v)}. 

We  next  show  that  T'  is  a  minimum  spanning  tree.  Since  (n,  v)  is  a  light  edge 
crossing  ( S ,  V  —  S)  and  (x,  y)  also  crosses  this  cut,  w(u,v)  <  w(x,  y).  Therefore, 

w(T')  =  w(T)  —  w(x,  y)  +  u;(w,  u) 

<  w(T). 


628 


Chapter  23  Minimum  Spanning  Trees 


Figure  23  J  The  proof  of  Theorem  23. 1 .  Black  vertices  are  in  S,  and  white  vertices  are  in  V  —  S. 
The  edges  in  the  minimum  spanning  tree  T  are  shown,  but  the  edges  in  the  graph  G  are  not.  The 
edges  in  A  are  shaded,  and  (w,  v)  is  a  light  edge  crossing  the  cut  (5,  V  —  S).  The  edge  (x,  y)  is 
an  edge  on  the  unique  simple  path  p  from  u  to  v  in  T .  To  form  a  minimum  spanning  tree  T'  that 
contains  (u,v),  remove  the  edge  (x,y)  from  T  and  add  the  edge  (m,  v). 

But  T  is  a  minimum  spanning  tree,  so  that  t v(T)  <  w(T');  thus,  T'  must  be  a 
minimum  spanning  tree  also. 

It  remains  to  show  that  (u,  v)  is  actually  a  safe  edge  for  A.  We  have  A  C  T' , 
since  A  C.T  and  (x,y)  ^  A\  thus,  A  U  {(w,  v)}  c  T' .  Consequently,  since  T'  is  a 
minimum  spanning  tree,  ( u ,  v)  is  safe  for  A.  m 

Theorem  23.1  gives  us  a  better  understanding  of  the  workings  of  the  Generic- 
MST  method  on  a  connected  graph  G  =  (V,  E).  As  the  method  proceeds,  the 
set  A  is  always  acyclic;  otherwise,  a  minimum  spanning  tree  including  A  would 
contain  a  cycle,  which  is  a  contradiction.  At  any  point  in  the  execution,  the  graph 
Ga  =  (V,  A)  is  a  forest,  and  each  of  the  connected  components  of  Ga  is  a  tree. 
(Some  of  the  trees  may  contain  just  one  vertex,  as  is  the  case,  for  example,  when 
the  method  begins:  A  is  empty  and  the  forest  contains  |T|  trees,  one  for  each 
vertex.)  Moreover,  any  safe  edge  ( u ,  u)  for  A  connects  distinct  components  of  Ga, 
since  A  U  {(u,  v)}  must  be  acyclic. 

The  while  loop  in  lines  2-4  of  Generic-MST  executes  |  K|  —  1  times  because 
it  finds  one  of  the  |F|  —  1  edges  of  a  minimum  spanning  tree  in  each  iteration. 
Initially,  when  A  =  0,  there  are  \  V\  trees  in  Ga,  and  each  iteration  reduces  that 
number  by  1.  When  the  forest  contains  only  a  single  tree,  the  method  terminates. 

The  two  algorithms  in  Section  23.2  use  the  following  corollary  to  Theorem  23. 1 . 
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Corollary  23.2 

Let  G  =  (V,  E)  be  a  connected,  undirected  graph  with  a  real-valued  weight  func¬ 
tion  w  defined  on  E.  Let  A  be  a  subset  of  E  that  is  included  in  some  minimum 
spanning  tree  for  G,  and  let  C  =  (Vc,  Ec)  be  a  connected  component  (tree)  in  the 
forest  Ga  =  (L.  A).  If  (u,  v)  is  a  light  edge  connecting  C  to  some  other  component 
in  Ga,  then  ( u ,  v)  is  safe  for  A. 

Proof  The  cut  (Vc,  V  —  Vc)  respects  A,  and  (u,  v)  is  a  light  edge  for  this  cut. 
Therefore,  (u,  v)  is  safe  for  A.  m 

Exercises 


23.1-1 

Let  (m,  v)  be  a  minimum-weight  edge  in  a  connected  graph  G.  Show  that  (u,  v) 
belongs  to  some  minimum  spanning  tree  of  G. 


23.1-2 

Professor  Sabatier  conjectures  the  following  converse  of  Theorem  23.1.  Let  G  = 
(V,  E)  be  a  connected,  undirected  graph  with  a  real-valued  weight  function  w  de¬ 
fined  on  E.  Let  A  be  a  subset  of  E  that  is  included  in  some  minimum  spanning 
tree  for  G,  let  ( S ,  V  —  S)  be  any  cut  of  G  that  respects  A,  and  let  (u,  v)  be  a  safe 
edge  for  A  crossing  ( S ,  V  —  S).  Then,  (u,  v)  is  a  light  edge  for  the  cut.  Show  that 
the  professor’s  conjecture  is  incorrect  by  giving  a  counterexample. 


23.1-3 

Show  that  if  an  edge  (u,  v)  is  contained  in  some  minimum  spanning  tree,  then  it  is 
a  light  edge  crossing  some  cut  of  the  graph. 


23.1-4 

Give  a  simple  example  of  a  connected  graph  such  that  the  set  of  edges  {(u,  v)  : 
there  exists  a  cut  (S,  V  —  S)  such  that  ( u ,  v)  is  a  light  edge  crossing  (S,  V  —  S)} 
does  not  form  a  minimum  spanning  tree. 


23.1-5 

Let  e  be  a  maximum-weight  edge  on  some  cycle  of  connected  graph  G  =  (V.  E). 
Prove  that  there  is  a  minimum  spanning  tree  of  G'  =  (V,  E  —  {<?})  that  is  also  a 
minimum  spanning  tree  of  G.  That  is,  there  is  a  minimum  spanning  tree  of  G  that 
does  not  include  e. 
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23.1-6 

Show  that  a  graph  has  a  unique  minimum  spanning  tree  if,  for  every  cut  of  the 
graph,  there  is  a  unique  light  edge  crossing  the  cut.  Show  that  the  converse  is  not 
true  by  giving  a  counterexample. 


23.1-7 

Argue  that  if  all  edge  weights  of  a  graph  are  positive,  then  any  subset  of  edges  that 
connects  all  vertices  and  has  minimum  total  weight  must  be  a  tree.  Give  an  example 
to  show  that  the  same  conclusion  does  not  follow  if  we  allow  some  weights  to  be 
nonpositive. 


23.1-8 

Let  T  be  a  minimum  spanning  tree  of  a  graph  G,  and  let  L  be  the  sorted  list  of  the 
edge  weights  of  T.  Show  that  for  any  other  minimum  spanning  tree  T'  of  G,  the 
list  L  is  also  the  sorted  list  of  edge  weights  of  T'. 


23.1- 9 

Let  T  be  a  minimum  spanning  tree  of  a  graph  G  =  (V,  E),  and  let  V'  be  a  subset 
of  V.  Let  T'  be  the  subgraph  of  T  induced  by  V' ,  and  let  G'  be  the  subgraph  of  G 
induced  by  V.  Show  that  if  T'  is  connected,  then  T'  is  a  minimum  spanning  tree 
of  G'. 

23.1- 10 

Given  a  graph  G  and  a  minimum  spanning  tree  T ,  suppose  that  we  decrease  the 
weight  of  one  of  the  edges  in  T.  Show  that  T  is  still  a  minimum  spanning  tree 
for  G.  More  formally,  let  T  be  a  minimum  spanning  tree  for  G  with  edge  weights 
given  by  weight  function  w.  Choose  one  edge  (x,y)  €  T  and  a  positive  number  k, 
and  define  the  weight  function  w'  by 

w(u,v )  if  (w,  v)  7^  (x,  y)  , 

w(x,y)  —  k  if  (m,  v)  =  (x,  y)  . 

Show  that  T  is  a  minimum  spanning  tree  for  G  with  edge  weights  given  by  w' . 

23.1- 11  * 

Given  a  graph  G  and  a  minimum  spanning  tree  T ,  suppose  that  we  decrease  the 
weight  of  one  of  the  edges  not  in  T .  Give  an  algorithm  for  finding  the  minimum 
spanning  tree  in  the  modified  graph. 


w'(u,  v)  = 
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23.2  The  algorithms  of  Kruskal  and  Prim 

The  two  minimum-spanning-tree  algorithms  described  in  this  section  elaborate  on 
the  generic  method.  They  each  use  a  specific  rule  to  determine  a  safe  edge  in  line  3 
of  Generic-MST.  In  KruskaTs  algorithm,  the  set  A  is  a  forest  whose  vertices  are 
all  those  of  the  given  graph.  The  safe  edge  added  to  A  is  always  a  least-weight 
edge  in  the  graph  that  connects  two  distinct  components.  In  Prim’s  algorithm,  the 
set  A  forms  a  single  tree.  The  safe  edge  added  to  A  is  always  a  least-weight  edge 
connecting  the  tree  to  a  vertex  not  in  the  tree. 

Kruskal’s  algorithm 

KruskaTs  algorithm  finds  a  safe  edge  to  add  to  the  growing  forest  by  finding,  of  all 
the  edges  that  connect  any  two  trees  in  the  forest,  an  edge  (u.  v)  of  least  weight. 
Let  Ci  and  C2  denote  the  two  trees  that  are  connected  by  (u.  v).  Since  (w,  v)  must 
be  a  light  edge  connecting  Ci  to  some  other  tree,  Corollary  23.2  implies  that  (u.  v) 
is  a  safe  edge  for  Ci.  KruskaTs  algorithm  qualifies  as  a  greedy  algorithm  because 
at  each  step  it  adds  to  the  forest  an  edge  of  least  possible  weight. 

Our  implementation  of  KruskaTs  algorithm  is  like  the  algorithm  to  compute 
connected  components  from  Section  21.1.  It  uses  a  disjoint-set  data  structure  to 
maintain  several  disjoint  sets  of  elements.  Each  set  contains  the  vertices  in  one  tree 
of  the  current  forest.  The  operation  Find-Set(w)  returns  a  representative  element 
from  the  set  that  contains  u.  Thus,  we  can  determine  whether  two  vertices  u  and  v 
belong  to  the  same  tree  by  testing  whether  Find-Set(u)  equals  Find-Set(v).  To 
combine  trees,  KruskaTs  algorithm  calls  the  UNION  procedure. 

MST-Kruskal(G,  w) 

1  A  =  0 

2  for  each  vertex  v  e  G.V 

3  Make-Set(v) 

4  sort  the  edges  of  G.E  into  nondecreasing  order  by  weight  w 

5  for  each  edge  (u,v)  e  G.E,  taken  in  nondecreasing  order  by  weight 

6  if  Find-Set(m)  ^  Find-Set(v) 

7  A  =  A  U  {( u ,  v)} 

8  Union(m,v) 

9  return  A 

Figure  23.4  shows  how  KruskaTs  algorithm  works.  Lines  1-3  initialize  the  set  A 
to  the  empty  set  and  create  \  V\  trees,  one  containing  each  vertex.  The  for  loop  in 
lines  5-8  examines  edges  in  order  of  weight,  from  lowest  to  highest.  The  loop 
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Figure  23.4  The  execution  of  Kruskal’s  algorithm  on  the  graph  from  Figure  23.1.  Shaded  edges 
belong  to  the  forest  A  being  grown.  The  algorithm  considers  each  edge  in  sorted  order  by  weight. 
An  arrow  points  to  the  edge  under  consideration  at  each  step  of  the  algorithm.  If  the  edge  joins  two 
distinct  trees  in  the  forest,  it  is  added  to  the  forest,  thereby  merging  the  two  trees. 


checks,  for  each  edge  (w ,  v),  whether  the  endpoints  u  and  v  belong  to  the  same 
tree.  If  they  do,  then  the  edge  (it ,  v)  cannot  be  added  to  the  forest  without  creating 
a  cycle,  and  the  edge  is  discarded.  Otherwise,  the  two  vertices  belong  to  different 
trees.  In  this  case,  line  7  adds  the  edge  (u,  v)  to  A,  and  line  8  merges  the  vertices 
in  the  two  trees. 
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Figure  23.4,  continued  Further  steps  in  the  execution  of  Kruskal’s  algorithm. 

The  running  time  of  Kruskal’s  algorithm  for  a  graph  G  =  (V,  E)  depends 
on  how  we  implement  the  disjoint-set  data  structure.  We  assume  that  we  use 
the  disjoint-set-forest  implementation  of  Section  2 1 .3  with  the  union-by-rank  and 
path-compression  heuristics,  since  it  is  the  asymptotically  fastest  implementation 
known.  Initializing  the  set  A  in  line  1  takes  0(1)  time,  and  the  time  to  sort  the 
edges  in  line  4  is  0(E  lg  E).  (We  will  account  for  the  cost  of  the  |U|  Make-Set 
operations  in  the  for  loop  of  lines  2-3  in  a  moment.)  The  for  loop  of  lines  5-8 
performs  0(E )  Find-Set  and  UNION  operations  on  the  disjoint-set  forest.  Along 
with  the  |F|  MAKE-SET  operations,  these  take  a  total  of  0((V  +  E)a(V))  time, 
where  a  is  the  very  slowly  growing  function  defined  in  Section  21.4.  Because  we 
assume  that  G  is  connected,  we  have  |£ |  >  \  V\  —  1,  and  so  the  disjoint-set  opera¬ 
tions  take  0(Ea(V))  time.  Moreover,  since  « ( |  K  |)  =  0(lg  V)  =  0( lg  E),  the  to¬ 
tal  running  time  of  Kruskal’s  algorithm  is  0(E  lg  E).  Observing  that  \E\  <  |U|2, 
we  have  lg  |£|  =  <9(lg  V).  and  so  we  can  restate  the  running  time  of  Kruskal’s 
algorithm  as  0(E  lg  V). 
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Prim’s  algorithm 

Like  Kruskal’s  algorithm.  Prim’s  algorithm  is  a  special  case  of  the  generic  min- 
imum-spanning-tree  method  from  Section  23.1.  Prim’s  algorithm  operates  much 
like  Dijkstra’s  algorithm  for  finding  shortest  paths  in  a  graph,  which  we  shall  see  in 
Section  24.3.  Prim’s  algorithm  has  the  property  that  the  edges  in  the  set  A  always 
form  a  single  tree.  As  Figure  23.5  shows,  the  tree  starts  from  an  arbitrary  root 
vertex  r  and  grows  until  the  tree  spans  all  the  vertices  in  V.  Each  step  adds  to  the 
tree  A  a  light  edge  that  connects  A  to  an  isolated  vertex— one  on  which  no  edge 
of  A  is  incident.  By  Corollary  23.2,  this  rule  adds  only  edges  that  are  safe  for  A; 
therefore,  when  the  algorithm  terminates,  the  edges  in  A  form  a  minimum  spanning 
tree.  This  strategy  qualifies  as  greedy  since  at  each  step  it  adds  to  the  tree  an  edge 
that  contributes  the  minimum  amount  possible  to  the  tree’s  weight. 

In  order  to  implement  Prim’s  algorithm  efficiently,  we  need  a  fast  way  to  select 
a  new  edge  to  add  to  the  tree  formed  by  the  edges  in  A.  In  the  pseudocode  below, 
the  connected  graph  G  and  the  root  r  of  the  minimum  spanning  tree  to  be  grown 
are  inputs  to  the  algorithm.  During  execution  of  the  algorithm,  all  vertices  that 
are  not  in  the  tree  reside  in  a  min-priority  queue  Q  based  on  a  key  attribute.  For 
each  vertex  v,  the  attribute  v.key  is  the  minimum  weight  of  any  edge  connecting  v 
to  a  vertex  in  the  tree;  by  convention,  v.key  =  oo  if  there  is  no  such  edge.  The 
attribute  v.n  names  the  parent  of  v  in  the  tree.  The  algorithm  implicitly  maintains 
the  set  A  from  Generic-MST  as 

A  =  {(v,  v.n)  :  v  €  V  —  {r}  —  Q}  . 

When  the  algorithm  terminates,  the  min-priority  queue  Q  is  empty;  the  minimum 
spanning  tree  A  for  G  is  thus 

A  =  {(v,  v.tt)  :  v  €  V  —  {r }}  . 

MST-Prim  (G,w,r) 

1  for  each  u  €  G.V 

2  u.key  =  oo 

3  u.n  =  NIL 

4  r.key  =  0 

5  Q  —  G.V 

6  while  g  /  0 

7  U  =  EXTRACT-MlN(g) 

8  for  each  v  e  G.Ad][u\ 

9  if  v  €  Q  and  w(u,  v)  <  v.key 

10  v.n  =  u 

1 1  v.key  =  w(u ,  v) 
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Figure  23.5  The  execution  of  Prim’s  algorithm  on  the  graph  from  Figure  23.1.  The  root  vertex 
is  a.  Shaded  edges  are  in  the  tree  being  grown,  and  black  vertices  are  in  the  tree.  At  each  step  of 
the  algorithm,  the  vertices  in  the  tree  determine  a  cut  of  the  graph,  and  a  light  edge  crossing  the  cut 
is  added  to  the  tree.  In  the  second  step,  for  example,  the  algorithm  has  a  choice  of  adding  either 
edge  (b,  c)  or  edge  (a,  h)  to  the  tree  since  both  are  light  edges  crossing  the  cut. 


636 


Chapter  23  Minimum  Spanning  Trees 


Figure  23.5  shows  how  Prim’s  algorithm  works.  Lines  1-5  set  the  key  of  each 
vertex  to  oo  (except  for  the  root  r,  whose  key  is  set  to  0  so  that  it  will  be  the 
first  vertex  processed),  set  the  parent  of  each  vertex  to  NIL,  and  initialize  the  min- 
priority  queue  Q  to  contain  all  the  vertices.  The  algorithm  maintains  the  following 
three-part  loop  invariant: 

Prior  to  each  iteration  of  the  while  loop  of  lines  6-11, 

1.  A  =  {(v.v.tt)  :  v  e  V  -{r}-  Q}. 

2.  The  vertices  already  placed  into  the  minimum  spanning  tree  are  those  in 

V-Q. 

3.  For  all  vertices  ye  Q,  if  v.jt  ^  NIL,  then  v.key  <  oo  and  v.key  is 
the  weight  of  a  light  edge  (v,v.jz)  connecting  v  to  some  vertex  already 
placed  into  the  minimum  spanning  tree. 

Line  7  identifies  a  vertex  u  e  Q  incident  on  a  light  edge  that  crosses  the  cut 
(V  —  Q,  Q)  (with  the  exception  of  the  first  iteration,  in  which  u  =  r  due  to  line  4). 
Removing  u  from  the  set  Q  adds  it  to  the  set  V  —  Q  of  vertices  in  the  tree,  thus 
adding  ( u,u.n )  to  A.  The  for  loop  of  lines  8-11  updates  the  key  and  n  attributes 
of  every  vertex  v  adjacent  to  u  but  not  in  the  tree,  thereby  maintaining  the  third 
part  of  the  loop  invariant. 

The  running  time  of  Prim’s  algorithm  depends  on  how  we  implement  the  min- 
priority  queue  Q.  If  we  implement  Q  as  a  binary  min-heap  (see  Chapter  6),  we 
can  use  the  Build-Min-Heap  procedure  to  perform  lines  1-5  in  0(V )  time.  The 
body  of  the  while  loop  executes  \  V\  times,  and  since  each  Extract-Min  opera¬ 
tion  takes  0(lg  V)  time,  the  total  time  for  all  calls  to  Extract-Min  is  0(  V  lg  V). 
The  for  loop  in  lines  8-11  executes  0(E)  times  altogether,  since  the  sum  of  the 
lengths  of  all  adjacency  lists  is  2\E\.  Within  the  for  loop,  we  can  implement  the 
test  for  membership  in  Q  in  line  9  in  constant  time  by  keeping  a  bit  for  each  vertex 
that  tells  whether  or  not  it  is  in  Q ,  and  updating  the  bit  when  the  vertex  is  removed 
from  Q.  The  assignment  in  line  11  involves  an  implicit  Decrease-Key  opera¬ 
tion  on  the  min-heap,  which  a  binary  min-heap  supports  in  0( lg  V)  time.  Thus, 
the  total  time  for  Prim’s  algorithm  is  0(V  lg  V  +  E  lg  V)  =  0(E  lg  V),  which  is 
asymptotically  the  same  as  for  our  implementation  of  Kruskal’s  algorithm. 

We  can  improve  the  asymptotic  running  time  of  Prim’s  algorithm  by  using  Fi¬ 
bonacci  heaps.  Chapter  19  shows  that  if  a  Fibonacci  heap  holds  \  V\  elements,  an 
Extract-Min  operation  takes  0(\g  V)  amortized  time  and  a  Decrease-Key 
operation  (to  implement  line  11)  takes  0(  1)  amortized  time.  Therefore,  if  we  use  a 
Fibonacci  heap  to  implement  the  min-priority  queue  Q,  the  running  time  of  Prim’s 
algorithm  improves  to  0(E  +  V  lg  V). 
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Exercises 


23.2-1 

Kruskal’s  algorithm  can  return  different  spanning  trees  for  the  same  input  graph  G, 
depending  on  how  it  breaks  ties  when  the  edges  are  sorted  into  order.  Show  that 
for  each  minimum  spanning  tree  T  of  G,  there  is  a  way  to  sort  the  edges  of  G  in 
Kruskal’s  algorithm  so  that  the  algorithm  returns  T . 


23.2-2 

Suppose  that  we  represent  the  graph  G  =  (E,  E)  as  an  adjacency  matrix.  Give  a 
simple  implementation  of  Prim’s  algorithm  for  this  case  that  runs  in  0(V2)  time. 


23.2-3 

For  a  sparse  graph  G  =  (V.E),  where  \E\  =  0(E),  is  the  implementation  of 
Prim’s  algorithm  with  a  Fibonacci  heap  asymptotically  faster  than  the  binary-heap 
implementation?  What  about  for  a  dense  graph,  where  |  E\  =  0(F2)?  How 
must  the  sizes  \E\  and  \  V\  be  related  for  the  Fibonacci-heap  implementation  to 
be  asymptotically  faster  than  the  binary-heap  implementation? 


23.2-4 

Suppose  that  all  edge  weights  in  a  graph  are  integers  in  the  range  from  1  to  \V\. 
How  fast  can  you  make  Kruskal’s  algorithm  run?  What  if  the  edge  weights  are 
integers  in  the  range  from  1  to  W  for  some  constant  W  ? 


23.2- 5 

Suppose  that  all  edge  weights  in  a  graph  are  integers  in  the  range  from  1  to  \V\. 
How  fast  can  you  make  Prim’s  algorithm  run?  What  if  the  edge  weights  are  integers 
in  the  range  from  1  to  IE  for  some  constant  W  ? 

23.2- 6  * 

Suppose  that  the  edge  weights  in  a  graph  are  uniformly  distributed  over  the  half¬ 
open  interval  [0,  1).  Which  algorithm,  Kruskal’s  or  Prim’s,  can  you  make  run 
faster? 


23.2-7  * 

Suppose  that  a  graph  G  has  a  minimum  spanning  tree  already  computed.  How 
quickly  can  we  update  the  minimum  spanning  tree  if  we  add  a  new  vertex  and 
incident  edges  to  G? 


23.2-8 

Professor  Borden  proposes  a  new  divide-and-conquer  algorithm  for  computing 
minimum  spanning  trees,  which  goes  as  follows.  Given  a  graph  G  =  (E,  E), 
partition  the  set  E  of  vertices  into  two  sets  Ei  and  V2  such  that  |  E,  |  and  |  V2  |  differ 
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by  at  most  1 .  Let  E ,  be  the  set  of  edges  that  are  incident  only  on  vertices  in  V\ ,  and 
let  E 2  be  the  set  of  edges  that  are  incident  only  on  vertices  in  V2.  Recursively  solve 
a  minimum-spanning-tree  problem  on  each  of  the  two  subgraphs  G ,  =  ( K, .  Et  ) 
and  G2  =  (V2,  E2).  Finally,  select  the  minimum-weight  edge  in  E  that  crosses  the 
cut  {V\,  V2),  and  use  this  edge  to  unite  the  resulting  two  minimum  spanning  trees 
into  a  single  spanning  tree. 

Either  argue  that  the  algorithm  correctly  computes  a  minimum  spanning  tree 
of  G,  or  provide  an  example  for  which  the  algorithm  fails. 


Problems 


23-1  Second-best  minimum  spanning  tree 

Let  G  =  (V,  E)  be  an  undirected,  connected  graph  whose  weight  function  is 
w  :  E  — y  R,  and  suppose  that  \E\  >  \  V\  and  all  edge  weights  are  distinct. 

We  define  a  second-best  minimum  spanning  tree  as  follows.  Let  T  be  the  set 
of  all  spanning  trees  of  G,  and  let  T'  be  a  minimum  spanning  tree  of  G.  Then 
a  second-best  minimum  spanning  tree  is  a  spanning  tree  T  such  that  w(T)  = 
minr//€5-_{7’/}  {w(T")}. 

a.  Show  that  the  minimum  spanning  tree  is  unique,  but  that  the  second-best  mini¬ 
mum  spanning  tree  need  not  be  unique. 

b.  Let  T  be  the  minimum  spanning  tree  of  G.  Prove  that  G  contains  edges 
(u,  v)  €  T  and  (x,  y)  $  T  such  that  T  —  {( u ,  v)}  U  {(x,  y)}  is  a  second-best 
minimum  spanning  tree  of  G. 

c.  Let  T  be  a  spanning  tree  of  G  and,  for  any  two  vertices  u,  v  e  V,  let  max[u ,  v] 
denote  an  edge  of  maximum  weight  on  the  unique  simple  path  between  u  and  v 
in  T.  Describe  an  0(F2)-time  algorithm  that,  given  T,  computes  max[u,  v]  for 
all  w,  v  e  V. 

d.  Give  an  efficient  algorithm  to  compute  the  second-best  minimum  spanning  tree 
of  G. 

23-2  Minimum  spanning  tree  in  sparse  graphs 

For  a  very  sparse  connected  graph  G  =  (V,  E),  we  can  further  improve  upon  the 
0(E  +  V  Ig  V)  running  time  of  Prim’s  algorithm  with  Fibonacci  heaps  by  prepro¬ 
cessing  G  to  decrease  the  number  of  vertices  before  running  Prim’s  algorithm.  In 
particular-,  we  choose,  for  each  vertex  u,  the  minimum-weight  edge  (u,  v)  incident 
on  u,  and  we  put  (u.  v)  into  the  minimum  spanning  tree  under  construction.  We 
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then  contract  all  chosen  edges  (see  Section  B.4).  Rather  than  contracting  these 
edges  one  at  a  time,  we  first  identify  sets  of  vertices  that  are  united  into  the  same 
new  vertex.  Then  we  create  the  graph  that  would  have  resulted  from  contracting 
these  edges  one  at  a  time,  but  we  do  so  by  “renaming”  edges  according  to  the  sets 
into  which  their  endpoints  were  placed.  Several  edges  from  the  original  graph  may 
be  renamed  the  same  as  each  other.  In  such  a  case,  only  one  edge  results,  and  its 
weight  is  the  minimum  of  the  weights  of  the  corresponding  original  edges. 

Initially,  we  set  the  minimum  spanning  tree  T  being  constructed  to  be  empty, 
and  for  each  edge  (u.v)  e  E,  we  initialize  the  attributes  ( u,v).orig  =  (u.v) 
and  (u,  v).c  =  w ( u ,  v).  We  use  the  orig  attribute  to  reference  the  edge  from  the 
initial  graph  that  is  associated  with  an  edge  in  the  contracted  graph.  The  c  attribute 
holds  the  weight  of  an  edge,  and  as  edges  are  contracted,  we  update  it  according  to 
the  above  scheme  for  choosing  edge  weights.  The  procedure  MST-REDUCE  takes 
inputs  G  and  T,  and  it  returns  a  contracted  graph  O'  with  updated  attributes  orig' 
and  c' .  The  procedure  also  accumulates  edges  of  G  into  the  minimum  spanning 
tree  T . 


MST-Reduce(G,  T) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 
23 


for  each  v  e  G.V 
v.mark  =  FALSE 
Make-Set(v) 


for  each  u  €  G.V 

if  u.mark  ==  FALSE 


choose  v  e  G.Adj[u]  such  that  (m,  v).c  is  minimized 
Union(m,  v) 

T  =  T  U  {(u.  v).orig } 


u.mark  =  v.mark  =  TRUE 


G'.V  =  {Find-Set(u)  :veG.V} 
G'.E  =  0 


for  each  (x.y)  e  G.E 
u  =  Find-Set(x) 
v  =  Find-Set(j) 


if  (u,v)  G'.E 

G'.E  =  G'.E  U  {(u.v)} 
(u.v). orig'  =  (x.y).orig 
(u.  v).c'  —  (x,  y).c 


else  if  (x,  y).c  <  (u.v).c' 


(u.v). orig'  =  (x.y).orig 
(u.  v).c'  =  (x,  y).c 


construct  adjacency  lists  G' .Ad j  for  G' 

return  G'  and  T 
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a.  Let  T  be  the  set  of  edges  returned  by  MST-Reduce,  and  let  A  be  the  minimum 
spanning  tree  of  the  graph  G'  formed  by  the  call  MST-Prim(G/,  c' ,  r ),  where  c' 
is  the  weight  attribute  on  the  edges  of  G' .E  and  r  is  any  vertex  in  G'.  V.  Prove 
that  T  U  {( x ,  y).orig'  :  (x,  y)  e  d  |  is  a  minimum  spanning  tree  of  G. 

b.  Argue  that  | G'.y |  <  \V\/2. 

c.  Show  how  to  implement  MST-Reduce  so  that  it  runs  in  0(E)  time.  (Hint: 
Use  simple  data  structures.) 

d.  Suppose  that  we  run  k  phases  of  MST-REDUCE,  using  the  output  G'  produced 
by  one  phase  as  the  input  G  to  the  next  phase  and  accumulating  edges  in  T. 
Argue  that  the  overall  running  time  of  the  k  phases  is  O(kE). 

e.  Suppose  that  after  running  k  phases  of  MST-REDUCE,  as  in  part  (d),  we  run 
Prim’s  algorithm  by  calling  MST-Prim(G',  c' ,  r),  where  G',  with  weight  at¬ 
tribute  c' ,  is  returned  by  the  last  phase  and  r  is  any  vertex  in  G'.  V.  Show  how 
to  pick  k  so  that  the  overall  running  time  is  0(E  Ig  Ig  V).  Argue  that  your 
choice  of  k  minimizes  the  overall  asymptotic  running  time. 

/.  For  what  values  of  |  E\  (in  terms  of  |  V | )  does  Prim’s  algorithm  with  preprocess¬ 
ing  asymptotically  beat  Prim’s  algorithm  without  preprocessing? 

23-3  Bottleneck  spanning  tree 

A  bottleneck  spanning  tree  T  of  an  undirected  graph  G  is  a  spanning  tree  of  G 
whose  largest  edge  weight  is  minimum  over  all  spanning  trees  of  G.  We  say  that 
the  value  of  the  bottleneck  spanning  tree  is  the  weight  of  the  maximum-weight 
edge  in  T . 

a.  Argue  that  a  minimum  spanning  tree  is  a  bottleneck  spanning  tree. 

Part  (a)  shows  that  finding  a  bottleneck  spanning  tree  is  no  harder  than  finding 
a  minimum  spanning  tree.  In  the  remaining  parts,  we  will  show  how  to  find  a 
bottleneck  spanning  tree  in  linear  time. 

b.  Give  a  linear-time  algorithm  that  given  a  graph  G  and  an  integer  b,  determines 
whether  the  value  of  the  bottleneck  spanning  tree  is  at  most  b. 

c.  Use  your  algorithm  for  part  (b)  as  a  subroutine  in  a  linear-time  algorithm  for 
the  bottleneck-spanning-tree  problem.  (Hint:  You  may  want  to  use  a  subroutine 
that  contracts  sets  of  edges,  as  in  the  MST-Reduce  procedure  described  in 
Problem  23-2.) 
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23-4  Alternative  minimum-spanning-tree  algorithms 

In  this  problem,  we  give  pseudocode  for  three  different  algorithms.  Each  one  takes 
a  connected  graph  and  a  weight  function  as  input  and  returns  a  set  of  edges  T .  For 
each  algorithm,  either  prove  that  T  is  a  minimum  spanning  tree  or  prove  that  T  is 
not  a  minimum  spanning  tree.  Also  describe  the  most  efficient  implementation  of 
each  algorithm,  whether  or  not  it  computes  a  minimum  spanning  tree. 

a.  Maybe-MST-A(G,  w) 

1  sort  the  edges  into  nonincreasing  order  of  edge  weights  w 

2  T  =  E 

3  for  each  edge  e,  taken  in  nonincreasing  order  by  weight 

4  if  E  —  {e}  is  a  connected  graph 

5  T  =  T-{e} 

6  return  T 

b.  Maybe-MST-B  (G,w) 

1  T  =  0 

2  for  each  edge  e,  taken  in  arbitrary  order 

3  if  T  U  {e}  has  no  cycles 

4  T  =  TU{e} 

5  return  T 

c.  Maybe-MST-CCG,  w) 

1  T  =  0 

2  for  each  edge  e,  taken  in  arbitrary  order 

3  T=T UM 

4  if  T  has  a  cycle  c 

5  let  e'  be  a  maximum-weight  edge  on  c 

6  T  =  T  -  {e'} 

7  return  T 


Chapter  notes 

Tarjan  [330]  surveys  the  minimum-spanning-tree  problem  and  provides  excellent 
advanced  material.  Graham  and  Hell  [151]  compiled  a  history  of  the  minimum- 
spanning-tree  problem. 

Tarjan  attributes  the  first  minimum-spanning-tree  algorithm  to  a  1926  paper  by 
O.  Boruvka.  Boruvka’s  algorithm  consists  of  running  G(lg  V)  iterations  of  the 
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procedure  MST-Reduce  described  in  Problem  23-2.  Kruskal’s  algorithm  was 
reported  by  Kruskal  [222]  in  1956.  The  algorithm  commonly  known  as  Prim’s 
algorithm  was  indeed  invented  by  Prim  [285],  but  it  was  also  invented  earlier  by 
V.  Jamfk  in  1930. 

The  reason  underlying  why  greedy  algorithms  are  effective  at  finding  minimum 
spanning  trees  is  that  the  set  of  forests  of  a  graph  forms  a  graphic  matroid.  (See 
Section  16.4.) 

When  \E\  =  G( F  Ig  V ),  Prim’s  algorithm,  implemented  with  Fibonacci  heaps, 
runs  in  0(E)  time.  For  sparser  graphs,  using  a  combination  of  the  ideas  from 
Prim’s  algorithm,  Kruskal’s  algorithm,  and  Boruvka’s  algorithm,  together  with  ad¬ 
vanced  data  structures,  Fredman  and  Tarjan  [114]  give  an  algorithm  that  runs  in 
0(E  lg*  V )  time.  Gabow,  Galil,  Spencer,  and  Tarjan  [120]  improved  this  algo¬ 
rithm  to  run  in  0(Elglg*  V)  time.  Chazelle  [60]  gives  an  algorithm  that  runs 
in  0(E  a(E,  V))  time,  where  ot(E ,  V)  is  the  functional  inverse  of  Ackermann’s 
function.  (See  the  chapter  notes  for  Chapter  21  for  a  brief  discussion  of  Acker¬ 
mann’s  function  and  its  inverse.)  Unlike  previous  minimum-spanning-tree  algo¬ 
rithms,  Chazelle’s  algorithm  does  not  follow  the  greedy  method. 

A  related  problem  is  spanning-tree  verification,  in  which  we  are  given  a  graph 
G  =  (V,  E)  and  a  tree  T  C  E,  and  we  wish  to  determine  whether  T  is  a  minimum 
spanning  tree  of  G.  King  [203]  gives  a  linear-time  algorithm  to  verify  a  spanning 
tree,  building  on  earlier  work  of  Komlos  [215]  and  Dixon,  Rauch,  and  Tarjan  [90]. 

The  above  algorithms  are  all  deterministic  and  fall  into  the  comparison-based 
model  described  in  Chapter  8.  Karger,  Klein,  and  Tarjan  [195]  give  a  randomized 
minimum-spanning-tree  algorithm  that  runs  in  0(V  +  E)  expected  time.  This 
algorithm  uses  recursion  in  a  manner  similar  to  the  linear-time  selection  algorithm 
in  Section  9.3:  a  recursive  call  on  an  auxiliary  problem  identifies  a  subset  of  the 
edges  E'  that  cannot  be  in  any  minimum  spanning  tree.  Another  recursive  call 
on  E  —  E'  then  finds  the  minimum  spanning  tree.  The  algorithm  also  uses  ideas 
from  Boruvka’s  algorithm  and  King’s  algorithm  for  spanning-tree  verification. 

Fredman  and  Willard  [116]  showed  how  to  find  a  minimum  spanning  tree  in 
0(V  +  E)  time  using  a  deterministic  algorithm  that  is  not  comparison  based.  Their 
algorithm  assumes  that  the  data  are  b- bit  integers  and  that  the  computer  memory 
consists  of  addressable  6-bit  words. 


24 


Single- Source  Shortest  Paths 


Professor  Patrick  wishes  to  find  the  shortest  possible  route  from  Phoenix  to  Indi¬ 
anapolis.  Given  a  road  map  of  the  United  States  on  which  the  distance  between 
each  pair  of  adjacent  intersections  is  marked,  how  can  she  determine  this  shortest 
route? 

One  possible  way  would  be  to  enumerate  all  the  routes  from  Phoenix  to  Indi¬ 
anapolis,  add  up  the  distances  on  each  route,  and  select  the  shortest.  It  is  easy  to 
see,  however,  that  even  disallowing  routes  that  contain  cycles,  Professor  Patrick 
would  have  to  examine  an  enoimous  number  of  possibilities,  most  of  which  are 
simply  not  worth  considering.  For  example,  a  route  from  Phoenix  to  Indianapolis 
that  passes  through  Seattle  is  obviously  a  poor  choice,  because  Seattle  is  several 
hundred  miles  out  of  the  way. 

In  this  chapter  and  in  Chapter  25,  we  show  how  to  solve  such  problems  ef¬ 
ficiently.  In  a  shortest-patlis  problem ,  we  are  given  a  weighted,  directed  graph 
G  =  (V,  E),  with  weight  function  w  :  /i  — ^  1R  mapping  edges  to  real-valued 
weights.  The  weight  w(p)  of  path  p  =  (i>0,  Vi, . . . ,  v*)  is  the  sum  of  the  weights 
of  its  constituent  edges: 
k 

w(p)  =  ^2w(Vi-i,Vi)  . 

1  =  1 

We  define  the  shortest-path  weight  S(u,v)  from  it  to  v  by 

l  m  i  n  { w  ( p )  :  it  v)  if  there  is  a  path  from  u  to  v  , 

0 \U ,  V)  —  \ 

I  oo  otherwise  . 

A  shortest  path  from  vertex  u  to  vertex  v  is  then  defined  as  any  path  p  with  weight 
w(p)  =  8(u,  v). 

In  the  Phoenix-to-Indianapolis  example,  we  can  model  the  road  map  as  a  graph: 
vertices  represent  intersections,  edges  represent  road  segments  between  intersec¬ 
tions,  and  edge  weights  represent  road  distances.  Our  goal  is  to  find  a  shortest  path 
from  a  given  intersection  in  Phoenix  to  a  given  intersection  in  Indianapolis. 
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Edge  weights  can  represent  metrics  other  than  distances,  such  as  time,  cost, 
penalties,  loss,  or  any  other  quantity  that  accumulates  linearly  along  a  path  and 
that  we  would  want  to  minimize. 

The  breadth-first-search  algorithm  from  Section  22.2  is  a  shortest-paths  algo¬ 
rithm  that  works  on  unweighted  graphs,  that  is,  graphs  in  which  each  edge  has  unit 
weight.  Because  many  of  the  concepts  from  breadth-first  search  arise  in  the  study 
of  shortest  paths  in  weighted  graphs,  you  might  want  to  review  Section  22.2  before 
proceeding. 

Variants 

In  this  chapter,  we  shall  focus  on  the  single-source  shortest-paths  problem :  given 
a  graph  G  =  (V,  E),  we  want  to  find  a  shortest  path  from  a  given  source  vertex 
s  e  V  to  each  vertex  v  e  V.  The  algorithm  for  the  single-source  problem  can 
solve  many  other  problems,  including  the  following  variants. 

Single-destination  shortest-paths  problem:  Find  a  shortest  path  to  a  given  des¬ 
tination  vertex  t  from  each  vertex  v.  By  reversing  the  direction  of  each  edge  in 
the  graph,  we  can  reduce  this  problem  to  a  single-source  problem. 

Single-pair  shortest-path  problem:  Find  a  shortest  path  from  u  to  v  for  given 
vertices  u  and  v.  If  we  solve  the  single-source  problem  with  source  vertex  u, 
we  solve  this  problem  also.  Moreover,  all  known  algorithms  for  this  problem 
have  the  same  worst-case  asymptotic  running  time  as  the  best  single-source 
algorithms. 

All-pairs  shortest-paths  problem:  Find  a  shortest  path  from  u  to  v  for  every  pair 
of  vertices  u  and  v.  Although  we  can  solve  this  problem  by  running  a  single¬ 
source  algorithm  once  from  each  vertex,  we  usually  can  solve  it  faster.  Addi¬ 
tionally,  its  structure  is  interesting  in  its  own  right.  Chapter  25  addresses  the 
all -pairs  problem  in  detail. 

Optimal  substructure  of  a  shortest  path 

Shortest-paths  algorithms  typically  rely  on  the  property  that  a  shortest  path  be¬ 
tween  two  vertices  contains  other  shortest  paths  within  it.  (The  Edmonds-Karp 
maximum-flow  algorithm  in  Chapter  26  also  relies  on  this  property.)  Recall 
that  optimal  substructure  is  one  of  the  key  indicators  that  dynamic  programming 
(Chapter  15)  and  the  greedy  method  (Chapter  16)  might  apply.  Dijkstra’s  algo¬ 
rithm,  which  we  shall  see  in  Section  24.3,  is  a  greedy  algorithm,  and  the  Floyd- 
Warshall  algorithm,  which  finds  shortest  paths  between  all  pairs  of  vertices  (see 
Section  25.2),  is  a  dynamic-programming  algorithm.  The  following  lemma  states 
the  optimal-substructure  property  of  shortest  paths  more  precisely. 
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Lemma  24.1  ( Subpaths  of  shortest  paths  are  shortest  paths ) 

Given  a  weighted,  directed  graph  G  =  (V,  E)  with  weight  function  w  :  ^  1R , 

let  p  =  (v0>  r i . Vg)  he  a  shortest  path  from  vertex  v0  to  vertex  vg  and,  for  any 

i  and  j  such  that  0  <i  <j  <k,  let  pij  =  (vi,  Vi + 1, ,  vf  be  the  subpath  of  p 
from  vertex  u,  to  vertex  Vj .  Then,  p;/  is  a  shortest  path  from  v,  to  Vj . 

Proof  If  we  decompose  path  p  into  v0  ^  vi  vj  vg,  then  we  have  that 
w{p)  =  w(poi)  +  w(pij)  +  w(pjg).  Now,  assume  that  there  is  a  path  p-  ■  from  u, 

to  Vj  with  weight  w ( pf )  <  u; (/?,/)■  Then,  v0  v,-  Vj  ^  vk  is  a  path  from  v0 
to  Vfe  whose  weight  w(p0i )  +  ttKpb)  +  w(pjg)  is  less  than  w(p),  which  contradicts 
the  assumption  that  p  is  a  shortest  path  from  v0  to  vg.  m 

Negative-weight  edges 

Some  instances  of  the  single-source  shortest-paths  problem  may  include  edges 
whose  weights  are  negative.  If  the  graph  G  =  (V,  E)  contains  no  negative- 
weight  cycles  reachable  from  the  source  s,  then  for  all  yef,  the  shortest-path 
weight  v)  remains  well  defined,  even  if  it  has  a  negative  value.  If  the  graph 
contains  a  negative-weight  cycle  reachable  from  s,  however,  shortest-path  weights 
are  not  well  defined.  No  path  from  s  to  a  vertex  on  the  cycle  can  be  a  short¬ 
est  path— we  can  always  find  a  path  with  lower  weight  by  following  the  proposed 
“shortest”  path  and  then  traversing  the  negative-weight  cycle.  If  there  is  a  negative- 
weight  cycle  on  some  path  from  s  to  v,  we  define  5(5,  v)  =  — oo. 

Figure  24.1  illustrates  the  effect  of  negative  weights  and  negative-weight  cy¬ 
cles  on  shortest-path  weights.  Because  there  is  only  one  path  from  s  to  a  (the 
path  ( s,a )),  we  have  8(s,a )  =  w(s,  a)  =  3.  Similarly,  there  is  only  one  path 
from  s  to  b,  and  so  5(5,  b)  =  w(s,a )  +  w(a.b)  =  3  +  (—4)  =  —1.  There  are 
infinitely  many  paths  from  s  to  c:  (s,  c ),  (5,  c,  d ,  c),  (5,  c,  d,  c,  d,  c),  and  so  on. 
Because  the  cycle  (c,d,c)  has  weight  6  +  (—3)  =  3  >  0,  the  shortest  path  from  5 
to  c  is  (5,  c),  with  weight  5(5,  c)  =  w(s,  c )  =  5.  Similarly,  the  shortest  path  from  5 
tori  is  (5,  c,d),  with  weight  5(5,  d)  =  w(s,c)  +  w(c.d )  =  11.  Analogously,  there 
are  infinitely  many  paths  from  5  to  e:  (5,  e),  (s,  e,  /,  e),  (s,  e,  /,  e,  /,  e),  and  so 
on.  Because  the  cycle  {<?.  /,  e)  has  weight  3  +  (—6)  =  —  3  <  0,  however,  there 
is  no  shortest  path  from  5  to  e.  By  traversing  the  negative-weight  cycle  (e,  /,  e) 
arbitrarily  many  times,  we  can  find  paths  from  5  to  e  with  arbitrarily  large  negative 
weights,  and  so  5(5,  e)  =  —00.  Similarly,  5(5,  /)  =  —00.  Because  g  is  reachable 
from  /,  we  can  also  find  paths  with  arbitrarily  large  negative  weights  from  5  to  g, 
and  so  5(5,  g)  =  —00.  Vertices  h,  i,  and  j  also  form  a  negative-weight  cycle.  They 
are  not  reachable  from  5,  however,  and  so  5(5.  h)  =  5(5,  i)  =  5(5,  j )  =  00. 
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Figure  24.1  Negative  edge  weights  in  a  directed  graph.  The  shortest  path  weight  from  source  s 
appears  within  each  vertex.  Because  vertices  e  and  /  form  a  negative  weight  cycle  reachable  from  s, 
they  have  shortest  path  weights  of — oo.  Because  vertex  g  is  reachable  from  a  vertex  whose  shortest 
path  weight  is  —  oo,  it,  too,  has  a  shortest  path  weight  of  — oo.  Vertices  such  as  h,  i,  and  j  are  not 
reachable  from  s,  and  so  their  shortest  path  weights  are  oo,  even  though  they  lie  on  a  negative  weight 
cycle. 

Some  shortest-paths  algorithms,  such  as  Dijkstra’s  algorithm,  assume  that  all 
edge  weights  in  the  input  graph  are  nonnegative,  as  in  the  road-map  example.  Oth¬ 
ers,  such  as  the  Bellman-Ford  algorithm,  allow  negative- weight  edges  in  the  in¬ 
put  graph  and  produce  a  correct  answer  as  long  as  no  negative-weight  cycles  are 
reachable  from  the  source.  Typically,  if  there  is  such  a  negative-weight  cycle,  the 
algorithm  can  detect  and  report  its  existence. 

Cycles 

Can  a  shortest  path  contain  a  cycle?  As  we  have  just  seen,  it  cannot  contain  a 
negative-weight  cycle.  Nor  can  it  contain  a  positive-weight  cycle,  since  remov¬ 
ing  the  cycle  from  the  path  produces  a  path  with  the  same  source  and  destination 
vertices  and  a  lower  path  weight.  That  is,  if  p  =  (v0,  Vi, . . . ,  v*)  is  a  path  and 
c  =  (v,-,  Vf+i, . . . ,  vy)  is  a  positive- weight  cycle  on  this  path  (so  that  v,-  =  vy  and 
w(c)  >  0),  then  the  path  p'  =  {vo,  v1(  ...,  v,-,  vy+i,  Vj+2,  ...,  vg)  has  weight 
w(p')  =  w(p)  —  w(c)  <  w(p),  and  so  p  cannot  be  a  shortest  path  from  t>o  to  vg. 

That  leaves  only  0-weight  cycles.  We  can  remove  a  0-weight  cycle  from  any 
path  to  produce  another  path  whose  weight  is  the  same.  Thus,  if  there  is  a  shortest 
path  from  a  source  vertex  s  to  a  destination  vertex  v  that  contains  a  0-weight  cycle, 
then  there  is  another  shortest  path  from  i  to  d  without  this  cycle.  As  long  as  a 
shortest  path  has  0-weight  cycles,  we  can  repeatedly  remove  these  cycles  from  the 
path  until  we  have  a  shortest  path  that  is  cycle-free.  Therefore,  without  loss  of 
generality  we  can  assume  that  when  we  are  finding  shortest  paths,  they  have  no 
cycles,  i.e.,  they  are  simple  paths.  Since  any  acyclic  path  in  a  graph  G  =  (V,E) 
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contains  at  most  \  V\  distinct  vertices,  it  also  contains  at  most  \  V\  —  1  edges.  Thus, 
we  can  restrict  our  attention  to  shortest  paths  of  at  most  |  V  \  —  1  edges. 

Representing  shortest  paths 

We  often  wish  to  compute  not  only  shortest-path  weights,  but  the  vertices  on  short¬ 
est  paths  as  well.  We  represent  shortest  paths  similarly  to  how  we  represented 
breadth-first  trees  in  Section  22.2.  Given  a  graph  G  =  (V,  E),  we  maintain  for 
each  vertex  v  e  V  a  predecessor  v .  n  that  is  either  another  vertex  or  NIL.  The 
shortest-paths  algorithms  in  this  chapter  set  the  n  attributes  so  that  the  chain  of  pre¬ 
decessors  originating  at  a  vertex  v  runs  backwards  along  a  shortest  path  from  ,v  to  v. 
Thus,  given  a  vertex  v  for  which  v.tz  ^  nil,  the  procedure  Print- Path (G,  s ,  v) 
from  Section  22.2  will  print  a  shortest  path  from  s  to  v. 

In  the  midst  of  executing  a  shortest-paths  algorithm,  however,  the  jz  values  might 
not  indicate  shortest  paths.  As  in  breadth-first  search,  we  shall  be  interested  in  the 
predecessor  subgraph  G„  =  (V„,  En)  induced  by  the  n  values.  Here  again,  we 
define  the  vertex  set  V„  to  be  the  set  of  vertices  of  G  with  non-NlL  predecessors, 
plus  the  source  s: 

14  =  {v  e  V  :  v.n  /  nil}  U  {s}  . 

The  directed  edge  set  E„  is  the  set  of  edges  induced  by  the  n  values  for  vertices 
in  14: 

E, r  =  {(v.n,  v)  e  E  :  v  e  V„  -  {5}}  . 

We  shall  prove  that  the  n  values  produced  by  the  algorithms  in  this  chapter  have 
the  property  that  at  termination  Gn  is  a  “shortest-paths  tree”— informally,  a  rooted 
tree  containing  a  shortest  path  from  the  source  s  to  every  vertex  that  is  reachable 
from  .v.  A  shortest-paths  tree  is  like  the  breadth-first  tree  from  Section  22.2,  but  it 
contains  shortest  paths  from  the  source  defined  in  terms  of  edge  weights  instead  of 
numbers  of  edges.  To  be  precise,  let  G  =  (V.  E)  be  a  weighted,  directed  graph 
with  weight  function  w  :  E  — >•  M,  and  assume  that  G  contains  no  negative-weight 
cycles  reachable  from  the  source  vertex  s  €  V,  so  that  shortest  paths  are  well 
defined.  A  shortest-paths  tree  rooted  at  s  is  a  directed  subgraph  G'  =  (V .  E'), 
where  f'cf  and  E'  C  E,  such  that 

1.  V'  is  the  set  of  vertices  reachable  from  s  in  G, 

2.  G'  forms  a  rooted  tree  with  root  s,  and 

3.  for  all  veP,  the  unique  simple  path  from  s  to  v  in  G'  is  a  shortest  path  from  s 
to  v  in  G. 
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Figure  2 42  (a)  A  weighted,  directed  graph  with  shortest  path  weights  from  source  5.  (b)  The 

shaded  edges  form  a  shortest  paths  tree  rooted  at  the  source  s.  (c)  Another  shortest  paths  tree  with 
the  same  root. 

Shortest  paths  are  not  necessarily  unique,  and  neither  are  shortest-paths  trees.  For 
example.  Figure  24.2  shows  a  weighted,  directed  graph  and  two  shortest-paths  trees 
with  the  same  root. 

Relaxation 

The  algorithms  in  this  chapter  use  the  technique  of  relaxation.  For  each  vertex 
v  €  V ,  we  maintain  an  attribute  v.d,  which  is  an  upper  bound  on  the  weight  of 
a  shortest  path  from  source  s  to  v.  We  call  v.d  a  shortest-path  estimate.  We 
initialize  the  shortest-path  estimates  and  predecessors  by  the  following  0(K)-time 
procedure: 

Initialize- Single-Source  (G,.s) 

1  for  each  vertex  v  e  G.  V 

2  v.d  =  oo 

3  V.JT  =  NIL 

4  s.d  =  0 

After  initialization,  we  have  v.n  =  NIL  for  all  v  c  V,  s.d  =  0,  and  v.d  =  oo  for 
v  e  F-{s}. 

The  process  of  relaxing  an  edge  ( u ,  v)  consists  of  testing  whether  we  can  im¬ 
prove  the  shortest  path  to  v  found  so  far  by  going  through  u  and,  if  so,  updat¬ 
ing  v.d  and  v.n.  A  relaxation  step1  may  decrease  the  value  of  the  shortest-path 


'it  may  seem  strange  that  the  term  “relaxation”  is  used  for  an  operation  that  tightens  an  upper  bound. 
The  use  of  the  term  is  historical.  The  outcome  of  a  relaxation  step  can  be  viewed  as  a  relaxation 
of  the  constraint  v.d  <  u.d  +  w(u,  v),  which,  by  the  triangle  inequality  (Lemma  24.10),  must  be 
satisfied  if  u.d  =  S(s.  u)  and  v.d  =  S(s,  v).  That  is,  if  v.d  <  u.d  +  w(u,  v),  there  is  no  “pressure” 
to  satisfy  this  constraint,  so  the  constraint  is  “relaxed.” 
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u  v 


U  V 


® 


Relax  (u,v,w) 


Relax(w,v,w) 


U  V 


U  V 


(a) 


(b) 


Figure  24.3  Relaxing  an  edge  (u,  v)  with  weight  w(n,  v)  =  2.  The  shortest  path  estimate  of  each 
vertex  appears  within  the  vertex,  (a)  Because  v.d  >  u.d  +  w(u ,  v)  prior  to  relaxation,  the  value 
of  v.d  decreases,  (b)  Here,  v.d  <  u.d  +  w(u ,  u)  before  relaxing  the  edge,  and  so  the  relaxation  step 
leaves  v.d  unchanged. 

estimate  v.d  and  update  it’s  predecessor  attribute  v.jt.  The  following  code  per¬ 
forms  a  relaxation  step  on  edge  (u.  v)  in  0(1)  time: 

Relax  (u,  v,  w) 

1  if  v.d  >  u.d  +  w(u,  v) 

2  v.d  =  u.d  +  w(u.  v) 

3  V.7Z  =  u 

Figure  24.3  shows  two  examples  of  relaxing  an  edge,  one  in  which  a  shortest-path 
estimate  decreases  and  one  in  which  no  estimate  changes. 

Each  algorithm  in  this  chapter  calls  Initialize-Single-Source  and  then  re¬ 
peatedly  relaxes  edges.  Moreover,  relaxation  is  the  only  means  by  which  shortest- 
path  estimates  and  predecessors  change.  The  algorithms  in  this  chapter  differ  in 
how  many  times  they  relax  each  edge  and  the  order  in  which  they  relax  edges.  Dijk- 
stra’s  algorithm  and  the  shortest-paths  algorithm  for  directed  acyclic  graphs  relax 
each  edge  exactly  once.  The  Bellman-Ford  algorithm  relaxes  each  edge  \  V\  —  1 
times. 

Properties  of  shortest  paths  and  relaxation 

To  prove  the  algorithms  in  this  chapter  correct,  we  shall  appeal  to  several  prop¬ 
erties  of  shortest  paths  and  relaxation.  We  state  these  properties  here,  and  Sec¬ 
tion  24.5  proves  them  formally.  For  your  reference,  each  property  stated  here  in¬ 
cludes  the  appropriate  lemma  or  corollary  number  from  Section  24.5.  The  latter 
five  of  these  properties,  which  refer  to  shortest-path  estimates  or  the  predecessor 
subgraph,  implicitly  assume  that  the  graph  is  initialized  with  a  call  to  Initialize  - 
Single-Source(G,  s )  and  that  the  only  way  that  shortest-path  estimates  and  the 
predecessor  subgraph  change  are  by  some  sequence  of  relaxation  steps. 
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Triangle  inequality  (Lemma  24.10) 

For  any  edge  ( u ,  v)  €  E,  we  have  8(s,  v )  <  8(s,  u)  +  w(u ,  v). 

Upper-bound  property  (Lemma  24.11) 

We  always  have  v.d  >  8<s,  v )  for  all  vertices  v  e  F,  and  once  v.d  achieves  the 
value  <5(5,  v),  it  never  changes. 

No-path  property  (Corollary  24.12) 

If  there  is  no  path  from  s  to  v,  then  we  always  have  v.d  =  <5(5,  v)  =  oo. 

Convergence  property  (Lemma  24. 14) 

If  .v  u  — >  v  is  a  shortest  path  in  G  for  some  u,  v  €  V,  and  if  u.d  =  <5(5,  u)  at 
any  time  prior  to  relaxing  edge  (u,  v),  then  v.d  =  5(5,  v)  at  all  times  afterward. 

Path-relaxation  property  (Lemma  24.15) 

If  p  =  { i'o ,  Vi, . . . ,  Vk)  is  a  shortest  path  from  5  =  v0  to  14,  and  we  relax  the 
edges  of  p  in  the  order  (v0,  V|),  (vi,  v2), . . . ,  (14.-1,  v*),  then  vg.d  —  8(s,  14). 
This  property  holds  regai'dless  of  any  other  relaxation  steps  that  occur,  even  if 
they  are  intermixed  with  relaxations  of  the  edges  of  p. 

Predecessor-subgraph  property  (Lemma  24.17) 

Once  v.d  =  8(s,  v)  for  all  r  e  V,  the  predecessor  subgraph  is  a  shortest-paths 
tree  rooted  at  s. 


Chapter  outline 

Section  24.1  presents  the  Bellman-Ford  algorithm,  which  solves  the  single-source 
shortest-paths  problem  in  the  general  case  in  which  edges  can  have  negative  weight. 
The  Bellman-Ford  algorithm  is  remarkably  simple,  and  it  has  the  further  benefit 
of  detecting  whether  a  negative-weight  cycle  is  reachable  from  the  source.  Sec¬ 
tion  24.2  gives  a  linear-time  algorithm  for  computing  shortest  paths  from  a  single 
source  in  a  directed  acyclic  graph.  Section  24.3  covers  Dijkstra’s  algorithm,  which 
has  a  lower  running  time  than  the  Bellman-Ford  algorithm  but  requires  the  edge 
weights  to  be  nonnegative.  Section  24.4  shows  how  we  can  use  the  Bellman-Ford 
algorithm  to  solve  a  special  case  of  1  i near  programming.  Finally,  Section  24.5 
proves  the  properties  of  shortest  paths  and  relaxation  stated  above. 

We  require  some  conventions  for  doing  arithmetic  with  infinities.  We  shall  as¬ 
sume  that  for  any  real  number  a  7^  —00,  we  have  a  +  00  =  00  +  a  =  00.  Also,  to 
make  our  proofs  hold  in  the  presence  of  negative-weight  cycles,  we  shall  assume 
that  for  any  real  number  a  7^  00,  we  have  a  +  (—00)  =  (—00)  +  a  =  —00. 

All  algorithms  in  this  chapter  assume  that  the  directed  graph  G  is  stored  in  the 
adjacency-list  representation.  Additionally,  stored  with  each  edge  is  its  weight,  so 
that  as  we  traverse  each  adjacency  list,  we  can  determine  the  edge  weights  in  G(l) 
time  per  edge. 
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24.1  The  Bellman-Ford  algorithm 

The  Bellman-Ford  algorithm  solves  the  single-source  shortest-paths  problem  in 
the  general  case  in  which  edge  weights  may  be  negative.  Given  a  weighted,  di¬ 
rected  graph  G  =  (V,  E)  with  source  s  and  weight  function  w  :  E  — »•  M,  the 
Bellman-Ford  algorithm  returns  a  boolean  value  indicating  whether  or  not  there  is 
a  negative-weight  cycle  that  is  reachable  from  the  source.  If  there  is  such  a  cy¬ 
cle,  the  algorithm  indicates  that  no  solution  exists.  If  there  is  no  such  cycle,  the 
algorithm  produces  the  shortest  paths  and  their  weights. 

The  algorithm  relaxes  edges,  progressively  decreasing  an  estimate  v.d  on  the 
weight  of  a  shortest  path  from  the  source  s  to  each  vertex  v  e  V  until  it  achieves 
the  actual  shortest-path  weight  S(s,  v).  The  algorithm  returns  TRUE  if  and  only  if 
the  graph  contains  no  negative-weight  cycles  that  are  reachable  from  the  source. 

Bellman-Ford(G,  w,s) 

1  Initialize-Single-Source(G,5) 

2  for  i  =  1  to  |G.  V\  —  1 

3  for  each  edge  ( u ,  v)  e  G.E 

4  Relax  ( u,v,w) 

5  for  each  edge  (u,  v)  e  G.E 

6  if  v.d  >  u.d  +  w{u,  v) 

7  return  FALSE 

8  return  TRUE 

Figure  24.4  shows  the  execution  of  the  Bellman-Ford  algorithm  on  a  graph 
with  5  vertices.  After  initializing  the  d  and  n  values  of  all  vertices  in  line  1, 
the  algorithm  makes  \V\  —  1  passes  over  the  edges  of  the  graph.  Each  pass  is 
one  iteration  of  the  for  loop  of  lines  2-4  and  consists  of  relaxing  each  edge  of  the 
graph  once.  Figures  24.4(b)-(e)  show  the  state  of  the  algorithm  after  each  of  the 
four  passes  over  the  edges.  After  making  \V\  —  1  passes,  lines  5-8  check  for  a 
negative-weight  cycle  and  return  the  appropriate  boolean  value.  (We’ll  see  a  little 
later  why  this  check  works.) 

The  Bellman-Ford  algorithm  runs  in  time  0{VE ),  since  the  initialization  in 
line  1  takes  0(F)  time,  each  of  the  |F|  —  1  passes  over  the  edges  in  lines  2-4 
takes  &(E)  time,  and  the  for  loop  of  lines  5-7  takes  0(E)  time. 

To  prove  the  correctness  of  the  Bellman-Ford  algorithm,  we  start  by  showing  that 
if  there  are  no  negative-weight  cycles,  the  algorithm  computes  correct  shortest-path 
weights  for  all  vertices  reachable  from  the  source. 
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Figure  24.4  The  execution  of  the  Bellman  Ford  algorithm.  The  source  is  vertex  s.  The  d  val 
ues  appear  within  the  vertices,  and  shaded  edges  indicate  predecessor  values:  if  edge  (w,v)  is 
shaded,  then  v.n  =  u.  In  this  particular  example,  each  pass  relaxes  the  edges  in  the  order 
(t,x),(t,  y),(t,z).  (x,t),  (y.x),  (y,z),(z,x),(z,s),(s,t),  (s,y).  (a)  The  situation  just  before  the 
first  pass  over  the  edges,  (b)  (e)  The  situation  after  each  successive  pass  over  the  edges.  The  d 
and  n  values  in  part  (e)  are  the  final  values.  The  Bellman  Ford  algorithm  returns  TRUE  in  this 
example. 


Lemma  24.2 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  source  s  and  weight  func¬ 
tion  w  :  E  -*  R,  and  assume  that  G  contains  no  negative- weight  cycles  that  are 
reachable  from  s.  Then,  after  the  |L|  —  1  iterations  of  the  for  loop  of  lines  2-4 
of  Bellman-Ford,  we  have  v.d  =  S(s,  v)  for  all  vertices  v  that  are  reachable 
from  s. 

Proof  We  prove  the  lemma  by  appealing  to  the  path-relaxation  property.  Con¬ 
sider  any  vertex  v  that  is  reachable  from  s,  and  let  p  =  (vo,  Vi, . . . ,  v*),  where 
Vo  =  5  and  v*  =  v,  be  any  shortest  path  from  itov.  Because  shortest  paths  are 
simple,  p  has  at  most  |F|  —  1  edges,  and  so  k  <  |K|  —  1.  Each  of  the  |F|  —  1  itera¬ 
tions  of  the  for  loop  of  lines  2-4  relaxes  all  \E  \  edges.  Among  the  edges  relaxed  in 
the  / th  iteration,  for  i  —  1,2 is  (v,_i,v,).  By  the  path-relaxation  property, 
therefore,  v.d  =  vg.d  =  S(s,  v*)  =  S(s,  v).  m 
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Corollary  24.3 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  source  vertex  5  and  weight 
function  w  :  E  — »•  M,  and  assume  that  G  contains  no  negative-weight  cycles  that 
are  reachable  from  s.  Then,  for  each  vertex  v  e  V,  there  is  a  path  from  .s'  to  v  if 
and  only  if  Bellman-Ford  terminates  with  v.d  <  oo  when  it  is  run  on  G. 

Proof  The  proof  is  left  as  Exercise  24.1-2.  ■ 

Theorem  24.4  ( Correctness  of  the  Bellman-Ford  algorithm ) 

Let  Bellman -Ford  be  run  on  a  weighted,  directed  graph  G  =  (V.  E)  with 
source  s  and  weight  function  w  :  E  — >  M.  If  G  contains  no  negative-weight  cycles 
that  are  reachable  from  s,  then  the  algorithm  returns  TRUE,  we  have  v.d  =  5(5,  v) 
for  all  vertices  v  e  V,  and  the  predecessor  subgraph  Gn  is  a  shortest-paths  tree 
rooted  at  s.  If  G  does  contain  a  negative-weight  cycle  reachable  from  s,  then  the 
algorithm  returns  FALSE. 

Proof  Suppose  that  graph  G  contains  no  negative- weight  cycles  that  are  reach¬ 
able  from  the  source  s.  We  first  prove  the  claim  that  at  termination,  v.d  =  5(5,  v) 
for  all  vertices  v  e  V.  If  vertex  v  is  reachable  from  s,  then  Lemma  24.2  proves  this 
claim.  If  v  is  not  reachable  from  s,  then  the  claim  follows  from  the  no-path  prop¬ 
erty.  Thus,  the  claim  is  proven.  The  predecessor-subgraph  property,  along  with  the 
claim,  implies  that  Gn  is  a  shortest-paths  tree.  Now  we  use  the  claim  to  show  that 
Bellman-Ford  returns  true.  At  termination,  we  have  for  all  edges  ( u ,  v)  e  E, 

v.d  =  5(5,  v) 

<  8(s,u)  +  w(u.  v)  (by  the  triangle  inequality) 

=  u.d  +  w(u.  v)  , 

and  so  none  of  the  tests  in  line  6  causes  Bellman-Ford  to  return  false.  There¬ 
fore,  it  returns  TRUE. 

Now,  suppose  that  graph  G  contains  a  negative-weight  cycle  that  is  reachable 
from  the  source  5;  let  this  cycle  be  c  =  {v0,V\ . 14),  where  v0  =  v^.  Then, 


k 


(24.1) 


Assume  for  the  purpose  of  contradiction  that  the  Bellman-Ford  algorithm  returns 
TRUE.  Thus,  Vj.d  <  Vi-i.d  +  io(v,-_i,  V;)  for  i  =  1,2  ,...,k.  Summing  the 
inequalities  around  cycle  c  gives  us 
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i  =  1 


-d.  +  w(Vj- 1,  v,-)) 

i  =  l 

=  ^V;_1.£/+  ^w(v,_i,  V,:)  . 


/=1 


i  =  1 


Since  v0  =  Vfc,  each  vertex  in  c  appears  exactly  once  in  each  of  the  summations 
Xlf=i  vi-d  and  JZf=i  vi-i-d,  and  so 


i  =  1  /=1 

Moreover,  by  Corollary  24.3,  v*. d  is  finite  for  i  —  1,2,...,/:.  Thus, 

k 


0  <  ^w(Vi-X,Vi)  , 

(=i 

which  contradicts  inequality  (24.1).  We  conclude  that  the  Bellman-Ford  algorithm 
returns  TRUE  if  graph  G  contains  no  negative-weight  cycles  reachable  from  the 
source,  and  FALSE  otherwise.  ■ 


Exercises 


24.1-1 

Run  the  Bellman-Ford  algorithm  on  the  directed  graph  of  Figure  24.4,  using  ver¬ 
tex  z  as  the  source.  In  each  pass,  relax  edges  in  the  same  order  as  in  the  figure,  and 
show  the  d  and  n  values  after  each  pass.  Now,  change  the  weight  of  edge  (z ,  x ) 
to  4  and  run  the  algorithm  again,  using  s  as  the  source. 


24.1-2 

Prove  Corollary  24.3. 


24.1-3 

Given  a  weighted,  directed  graph  G  =  (V,  E)  with  no  negative-weight  cycles, 
let  m  be  the  maximum  over  all  vertices  i>  e  V  of  the  minimum  number  of  edges 
in  a  shortest  path  from  the  source  r  to  v.  (Here,  the  shortest  path  is  by  weight,  not 
the  number  of  edges.)  Suggest  a  simple  change  to  the  Bellman-Ford  algorithm  that 
allows  it  to  terminate  in  m  +  1  passes,  even  if  m  is  not  known  in  advance. 


24.1-4 

Modify  the  Bellman-Ford  algorithm  so  that  it  sets  v.d  to  — oo  for  all  vertices  v  for 
which  there  is  a  negative-weight  cycle  on  some  path  from  the  source  to  v. 
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24.1- 5  * 

Let  G  =  (  V,  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — >•  R. 
Give  an  0(VE (-time  algorithm  to  find,  for  each  vertex  v  s  F,  the  value  <F(v)  = 
minueV  {S(u,v)j. 

24.1- 6  * 

Suppose  that  a  weighted,  directed  graph  G  =  (F,  E )  has  a  negative-weight  cycle. 
Give  an  efficient  algorithm  to  list  the  vertices  of  one  such  cycle.  Prove  that  your 
algorithm  is  correct. 


24.2  Single-source  shortest  paths  in  directed  acyclic  graphs 

By  relaxing  the  edges  of  a  weighted  dag  (directed  acyclic  graph)  G  =  (V.  E) 
according  to  a  topological  sort  of  its  vertices,  we  can  compute  shortest  paths  from 
a  single  source  in  0(F  +  E)  time.  Shortest  paths  are  always  well  defined  in  a  dag, 
since  even  if  there  are  negative-weight  edges,  no  negative-weight  cycles  can  exist. 

The  algorithm  starts  by  topologically  sorting  the  dag  (see  Section  22.4)  to  im¬ 
pose  a  linear  ordering  on  the  vertices.  If  the  dag  contains  a  path  from  vertex  u  to 
vertex  v,  then  u  precedes  v  in  the  topological  sort.  We  make  just  one  pass  over  the 
vertices  in  the  topologically  sorted  order.  As  we  process  each  vertex,  we  relax  each 
edge  that  leaves  the  vertex. 

Dag-Shortest-Paths(G,  w,s) 

1  topologically  sort  the  vertices  of  G 

2  Initialize-Single-Source(G,5) 

3  for  each  vertex  u,  taken  in  topologically  sorted  order 

4  for  each  vertex  v  €  G.Adj[u\ 

5  Relax  (m,  v,  w) 

Figure  24.5  shows  the  execution  of  this  algorithm. 

The  running  time  of  this  algorithm  is  easy  to  analyze.  As  shown  in  Section  22.4, 
the  topological  sort  of  line  1  takes  0(F  +  E)  time.  The  call  of  Initialize- 
S INGLE- SOURCE  in  line  2  takes  0(F)  time.  The  for  loop  of  lines  3-5  makes  one 
iteration  per  vertex.  Altogether,  the  for  loop  of  lines  4-5  relaxes  each  edge  exactly 
once.  (We  have  used  an  aggregate  analysis  here.)  Because  each  iteration  of  the 
inner  for  loop  takes  0(1)  time,  the  total  running  time  is  0(F  +  E),  which  is  linear 
in  the  size  of  an  adjacency-list  representation  of  the  graph. 

The  following  theorem  shows  that  the  Dag-Shortest-Paths  procedure  cor¬ 
rectly  computes  the  shortest  paths. 
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Figure  24.5  The  execution  of  the  algorithm  for  shortest  paths  in  a  directed  acyclic  graph.  The 
vertices  are  topologically  sorted  from  left  to  right.  The  source  vertex  is  s.  The  d  values  appear 
within  the  vertices,  and  shaded  edges  indicate  the  n  values,  (a)  The  situation  before  the  first  iteration 
of  the  for  loop  of  lines  3  5.  (b)  (g)  The  situation  after  each  iteration  of  the  for  loop  of  lines  3  5. 
The  newly  blackened  vertex  in  each  iteration  was  used  as  u  in  that  iteration.  The  values  shown  in 
part  (g)  are  the  final  values. 

Theorem  24.5 

If  a  weighted,  directed  graph  G  =  (V,  E)  has  source  vertex  5  and  no  cycles,  then 
at  the  termination  of  the  Dag-Shortest-Paths  procedure,  v.d  =  8(s,  v)  for  all 
vertices  v  e  V,  and  the  predecessor  subgraph  G„  is  a  shortest-paths  tree. 

Proof  We  first  show  that  v.d  —  8(s,  v)  for  all  vertices  v  e  V  at  termina¬ 
tion.  If  v  is  not  reachable  from  s,  then  v.d  =  8(s,  v)  =  oo  by  the  no-path 
property.  Now,  suppose  that  v  is  reachable  from  s,  so  that  there  is  a  short¬ 
est  path  p  =  (v0,  vx, . . . ,  vg),  where  v0  =  s  and  vg  =  v.  Because  we  pro- 
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cess  the  vertices  in  topologically  sorted  order,  we  relax  the  edges  on  p  in  the 
order  (u0,  tq),  (vl5  v2 ), . . . ,  (vk-  \  ,  vk)-  The  path-relaxation  property  implies  that 

Vj.d  =  S(s\  Vi)  at  termination  for  i  =  0,  1 . k.  Finally,  by  the  predecessor- 

subgraph  property,  Gn  is  a  shortest-paths  tree.  ■ 

An  interesting  application  of  this  algorithm  arises  in  determining  critical  paths 
in  PERT  chart 2  analysis.  Edges  represent  jobs  to  be  performed,  and  edge  weights 
represent  the  times  required  to  perform  particular-  jobs.  If  edge  ( u ,  v)  enters  ver¬ 
tex  v  and  edge  (it,  x)  leaves  v,  then  job  (it.  v)  must  be  performed  before  job  (v.  x). 
A  path  through  this  dag  represents  a  sequence  of  jobs  that  must  be  performed  in  a 
particular  order.  A  critical  path  is  a  longest  path  through  the  dag,  corresponding 
to  the  longest  time  to  perform  any  sequence  of  jobs.  Thus,  the  weight  of  a  critical 
path  provides  a  lower  bound  on  the  total  time  to  perform  all  the  jobs.  We  can  find 
a  critical  path  by  either 

•  negating  the  edge  weights  and  running  Dag-Shortest-Paths,  or 

•  running  Dag-Shortest-Paths,  with  the  modification  that  we  replace  “oo” 
by  “—oo”  in  line  2  of  INITIALIZE-SINGLE-SOURCE  and  “>  ”  by  “<”  in  the 
Relax  procedure. 

Exercises 


24.2-1 

Run  Dag-Shortest-Paths  on  the  directed  graph  of  Figure  24.5,  using  vertex  r 
as  the  source. 


24.2-2 

Suppose  we  change  line  3  of  Dag-Shortest-Paths  to  read 
3  for  the  first  |  V  |  —  1  vertices,  taken  in  topologically  sorted  order 
Show  that  the  procedure  would  remain  correct. 


24.2-3 

The  PERT  chart  formulation  given  above  is  somewhat  unnatural.  In  a  more  natu¬ 
ral  structure,  vertices  would  represent  jobs  and  edges  would  represent  sequencing 
constraints;  that  is,  edge  (u.  v)  would  indicate  that  job  u  must  be  performed  before 
job  v.  We  would  then  assign  weights  to  vertices,  not  edges.  Modify  the  Dag- 
Shortest- Paths  procedure  so  that  it  finds  a  longest  path  in  a  directed  acyclic 
graph  with  weighted  vertices  in  linear  time. 


2“PERT”  is  an  acronym  for  “program  evaluation  and  review  technique.” 
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24.2-4 

Give  an  efficient  algorithm  to  count  the  total  number  of  paths  in  a  directed  acyclic 
graph.  Analyze  your  algorithm. 


24.3  Dijkstra’s  algorithm 

Dijkstra’s  algorithm  solves  the  single-source  shortest-paths  problem  on  a  weighted, 
directed  graph  G  =  (V.  E)  for  the  case  in  which  all  edge  weights  are  nonnegative. 
In  this  section,  therefore,  we  assume  that  w(u,  v)  >  0  for  each  edge  (u,  v )  €  E.  As 
we  shall  see,  with  a  good  implementation,  the  running  time  of  Dijkstra’s  algorithm 
is  lower  than  that  of  the  Bellman-Ford  algorithm. 

Dijkstra’s  algorithm  maintains  a  set  S  of  vertices  whose  final  shortest-path 
weights  from  the  source  s  have  already  been  determined.  The  algorithm  repeat¬ 
edly  selects  the  vertex  u  e  V  —  S  with  the  minimum  shortest-path  estimate,  adds  u 
to  S,  and  relaxes  all  edges  leaving  u.  In  the  following  implementation,  we  use  a 
min-priority  queue  Q  of  vertices,  keyed  by  their  d  values. 

Dijkstra(G,  w,s) 

1  Initialize-Single-Source(G,5) 

2  5  =  0 

3  Q  =  G.V 

4  while  Q  ^  0 

5  U  =  EXTRACT-MlN(g) 

6  S  =  SU{u} 

7  for  each  vertex  v  €  G.Adj[u\ 

8  Relax (u,  v,  w) 

Dijkstra’s  algorithm  relaxes  edges  as  shown  in  Figure  24.6.  Line  1  initializes 
the  d  and  n  values  in  the  usual  way,  and  line  2  initializes  the  set  S  to  the  empty 
set.  The  algorithm  maintains  the  invariant  that  Q  =  V  —  S  at  the  start  of  each 
iteration  of  the  while  loop  of  lines  4-8.  Line  3  initializes  the  min-priority  queue  Q 
to  contain  all  the  vertices  in  V ;  since  S  =  0  at  that  time,  the  invariant  is  true  after 
line  3.  Each  time  through  the  while  loop  of  lines  4-8,  line  5  extracts  a  vertex  u  from 
Q  =  V  —  S  and  line  6  adds  it  to  set  S,  thereby  maintaining  the  invariant.  (The  first 
time  through  this  loop,  u  =  s.)  Vertex  u,  therefore,  has  the  smallest  shortest-path 
estimate  of  any  vertex  in  V  —  S.  Then,  lines  7-8  relax  each  edge  (u,  v)  leaving  u, 
thus  updating  the  estimate  v.d  and  the  predecessor  v.jt  if  we  can  improve  the 
shortest  path  to  v  found  so  far  by  going  through  u.  Observe  that  the  algorithm 
never  inserts  vertices  into  Q  after  line  3  and  that  each  vertex  is  extracted  from  Q 
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Figure  24.6  The  execution  of  Dijkstra’s  algorithm.  The  source  s  is  the  leftmost  vertex.  The 
shortest  path  estimates  appear  within  the  vertices,  and  shaded  edges  indicate  predecessor  values. 
Black  vertices  are  in  the  set  S,  and  white  vertices  are  in  the  min  priority  queue  Q  =  V  —  S.  (a)  The 
situation  just  before  the  first  iteration  of  the  while  loop  of  lines  4  8.  The  shaded  vertex  has  the  mini 
mum  d  value  and  is  chosen  as  vertex  u  in  line  5.  (b)  (f)  The  situation  after  each  successive  iteration 
of  the  while  loop.  The  shaded  vertex  in  each  part  is  chosen  as  vertex  u  in  line  5  of  the  next  iteration. 
The  d  values  and  predecessors  shown  in  part  (f)  are  the  final  values. 


and  added  to  S  exactly  once,  so  that  the  while  loop  of  lines  4-8  iterates  exactly  |  V\ 
times. 

Because  Dijkstra’s  algorithm  always  chooses  the  “lightest”  or  “closest”  vertex 
in  V  —  S  to  add  to  set  S,  we  say  that  it  uses  a  greedy  strategy.  Chapter  16  explains 
greedy  strategies  in  detail,  but  you  need  not  have  read  that  chapter  to  understand 
Dijkstra’s  algorithm.  Greedy  strategies  do  not  always  yield  optimal  results  in  gen¬ 
eral,  but  as  the  following  theorem  and  its  corollary  show,  Dijkstra’s  algorithm  does 
indeed  compute  shortest  paths.  The  key  is  to  show  that  each  time  it  adds  a  vertex  u 
to  set  S ,  we  have  u.d  =  <S(s,  n). 

Theorem  24.6  (Correctness  of  Dijkstra’s  algorithm) 

Dijkstra’s  algorithm,  run  on  a  weighted,  directed  graph  G  =  (V,  E)  with  non¬ 
negative  weight  function  w  and  source  s,  terminates  with  u.d  —  S(s,u)  for  all 
vertices  u  e  V. 
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Figure  24.7  The  proof  of  Theorem  24.6.  Set  S  is  nonempty  just  before  vertex  u  is  added  to  it.  We 
decompose  a  shortest  path  p  from  source  5  to  vertex  u  into  s  x  -*■  y  &  u,  where  y  is  the  first 
vertex  on  the  path  that  is  not  in  S  and  x  e  S  immediately  precedes  y.  Vertices  x  and  y  are  distinct, 
but  we  may  have  s  =  x  or  y  =  u.  Path  p2  may  or  may  not  reenter  set  S. 

Proof  We  use  the  following  loop  invariant: 

At  the  start  of  each  iteration  of  the  while  loop  of  lines  4-8,  v.d  =  5(5,  v) 
for  each  vertex  v  €  5. 

It  suffices  to  show  for  each  vertex  u  €  V,  we  have  u.d  =  <5(^ ,  u)  at  the  time  when  u 
is  added  to  set  5.  Once  we  show  that  u.d  =  5(5,  u),  we  rely  on  the  upper-bound 
property  to  show  that  the  equality  holds  at  all  times  thereafter. 

Initialization:  Initially,  5  =  0,  and  so  the  invariant  is  trivially  true. 

Maintenance:  We  wish  to  show  that  in  each  iteration,  u.d  =  5(5,  w)  for  the  vertex 
added  to  set  5.  For  the  purpose  of  contradiction,  let  u  be  the  first  vertex  for 
which  u.d  ^  8(s,u)  when  it  is  added  to  set  5.  We  shall  focus  our  attention 
on  the  situation  at  the  beginning  of  the  iteration  of  the  while  loop  in  which  u 
is  added  to  5  and  derive  the  contradiction  that  u.d  =  5(5,  u)  at  that  time  by 
examining  a  shortest  path  from  s  to  u.  We  must  have  u  /  5  because  5  is  the 
first  vertex  added  to  set  5  and  s.d  =  5(5,5)  =  0  at  that  time.  Because  u  /  s, 
we  also  have  that  5/0  just  before  u  is  added  to  5.  There  must  be  some 
path  from  s  to  u,  for  otherwise  u.d  =  8(s,  u)  =  oo  by  the  no-path  property, 
which  would  violate  our  assumption  that  u.d  /  5(5,  w).  Because  there  is  at 
least  one  path,  there  is  a  shortest  path  p  from  s  to  u.  Prior  to  adding  u  to  5, 
path  p  connects  a  vertex  in  5,  namely  5,  to  a  vertex  in  V  —  5,  namely  u.  Let  us 
consider  the  first  vertex  y  along  p  such  that  y  e  V  —  5,  and  let  x  e  S  be  y’s 
predecessor  along  p.  Thus,  as  Figure  24.7  illustrates,  we  can  decompose  path  p 
into  5  x  — ►  y  &  u.  (Either  of  paths  px  or  p2  may  have  no  edges.) 

We  claim  that  y.d  =  5(5,  y)  when  u  is  added  to  5.  To  prove  this  claim,  ob¬ 
serve  that  x  €  5.  Then,  because  we  chose  u  as  the  first  vertex  for  which 
u.d  /  5(5,  n)  when  it  is  added  to  5,  we  had  x.d  =  5(5,  x)  when  x  was  added 
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to  S.  Edge  (x,  y)  was  relaxed  at  that  time,  and  the  claim  follows  from  the 
convergence  property. 

We  can  now  obtain  a  contradiction  to  prove  that  u.d  =  S(s.  u).  Because  y 
appears  before  u  on  a  shortest  path  from  s  to  u  and  all  edge  weights  are  non¬ 
negative  (notably  those  on  path  p2),  we  have  8(s,  y )  <  8(s,  u),  and  thus 

y.d  =  8(s,y) 

<  S(s ,  u)  (24.2) 

<  u.d  (by  the  upper-bound  property)  . 

But  because  both  vertices  u  and  y  were  in  V  —  S  when  u  was  chosen  in  line  5, 
we  have  u.d  <  y.d.  Thus,  the  two  inequalities  in  (24.2)  are  in  fact  equalities, 
giving 

y.d  —  8(s,  y)  =  8(s ,  u)  =  u.d  . 

Consequently,  u.d  =  8(s,u),  which  contradicts  our  choice  of  u.  We  conclude 
that  u.d  =  8(s.  u)  when  u  is  added  to  S,  and  that  this  equality  is  maintained  at 
all  times  thereafter. 

Termination:  At  termination,  (2  =  0  which,  along  with  our  earlier  invariant  that 
Q  =  V  —  S,  implies  that  S  =  V.  Thus,  u.d  =  8(s,  u)  for  all  vertices  u  S  V .  ■ 

Corollary  24.7 

If  we  run  Dijkstra’s  algorithm  on  a  weighted,  directed  graph  G  =  (V,  E)  with 
nonnegative  weight  function  w  and  source  s,  then  at  termination,  the  predecessor 
subgraph  Gn  is  a  shortest-paths  tree  rooted  at  s. 

Proof  Immediate  from  Theorem  24.6  and  the  predecessor-subgraph  property.  ■ 
Analysis 

How  fast  is  Dijkstra’s  algorithm?  It  maintains  the  min-priority  queue  Q  by  call¬ 
ing  three  priority-queue  operations:  INSERT  (implicit  in  line  3),  Extract-Min 
(line  5),  and  Decrease-Key  (implicit  in  Relax,  which  is  called  in  line  8).  The 
algorithm  calls  both  INSERT  and  Extract-Min  once  per  vertex.  Because  each 
vertex  u  €  V  is  added  to  set  S  exactly  once,  each  edge  in  the  adjacency  list  Adj[u] 
is  examined  in  the  for  loop  of  lines  7-8  exactly  once  during  the  course  of  the  al¬ 
gorithm.  Since  the  total  number  of  edges  in  all  the  adjacency  lists  is  \E\,  this  for 
loop  iterates  a  total  of  | |  times,  and  thus  the  algorithm  calls  Decrease-Key  at 
most  \  E\  times  overall.  (Observe  once  again  that  we  are  using  aggregate  analysis.) 

The  running  time  of  Dijkstra’s  algorithm  depends  on  how  we  implement  the 
min-priority  queue.  Consider  first  the  case  in  which  we  maintain  the  min-priority 
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queue  by  taking  advantage  of  the  vertices  being  numbered  1  to  \  V\.  We  simply 
store  v.d  in  the  nth  entry  of  an  array.  Each  INSERT  and  Decrease-Key  operation 
takes  0(1)  time,  and  each  Extract-Min  operation  takes  0(V)  time  (since  we 
have  to  search  through  the  entire  array),  for  a  total  time  of  0(V2  +  E)  =  0(V2). 

If  the  graph  is  sufficiently  sparse— in  particular,  E  =  o(V2  /  \gV)  — we  can 
improve  the  algorithm  by  implementing  the  min-priority  queue  with  a  binary  min- 
heap.  (As  discussed  in  Section  6.5,  the  implementation  should  make  sure  that 
vertices  and  corresponding  heap  elements  maintain  handles  to  each  other.)  Each 
Extract-Min  operation  then  takes  time  0(lg  V ).  As  before,  there  are  \  V\  such 
operations.  The  time  to  build  the  binary  min-heap  is  0{V).  Each  Decrease-Key 
operation  takes  time  Oflg  V ),  and  there  are  still  at  most  \E\  such  operations.  The 
total  running  time  is  therefore  0({V  +  E)  lg  V ),  which  is  0(E  lg  V)  if  all  vertices 
are  reachable  from  the  source.  This  running  time  improves  upon  the  straightfor¬ 
ward  0(V2)- time  implementation  if  E  =  o(V2/  lg  V). 

We  can  in  fact  achieve  a  running  time  of  0(V  lg  V  +  E)  by  implementing  the 
min-priority  queue  with  a  Fibonacci  heap  (see  Chapter  19).  The  amortized  cost 
of  each  of  the  \V\  Extract-Min  operations  is  0( lg  V),  and  each  DECREASE- 
Key  call,  of  which  there  are  at  most  \E\,  takes  only  0(1)  amortized  time.  His¬ 
torically,  the  development  of  Fibonacci  heaps  was  motivated  by  the  observation 
that  Dijkstra’s  algorithm  typically  makes  many  more  Decrease-Key  calls  than 
Extract-Min  calls,  so  that  any  method  of  reducing  the  amortized  time  of  each 
Decrease-Key  operation  to  o(lg  V)  without  increasing  the  amortized  time  of 
Extract-Min  would  yield  an  asymptotically  faster  implementation  than  with  bi¬ 
nary  heaps. 

Dijkstra’s  algorithm  resembles  both  breadth-first  search  (see  Section  22.2)  and 
Prim’s  algorithm  for  computing  minimum  spanning  trees  (see  Section  23.2).  It  is 
like  breadth-first  search  in  that  set  S  corresponds  to  the  set  of  black  vertices  in  a 
breadth-first  search;  just  as  vertices  in  S  have  their  final  shortest-path  weights,  so 
do  black  vertices  in  a  breadth-first  search  have  their  correct  breadth-first  distances. 
Dijkstra’s  algorithm  is  like  Prim’s  algorithm  in  that  both  algorithms  use  a  min- 
priority  queue  to  find  the  “lightest”  vertex  outside  a  given  set  (the  set  S  in  Dijkstra’s 
algorithm  and  the  tree  being  grown  in  Prim’s  algorithm),  add  this  vertex  into  the 
set,  and  adjust  the  weights  of  the  remaining  vertices  outside  the  set  accordingly. 

Exercises 


24.3-1 

Run  Dijkstra’s  algorithm  on  the  directed  graph  of  Figure  24.2,  first  using  vertex  s 
as  the  source  and  then  using  vertex  z  as  the  source.  In  the  style  of  Figure  24.6, 
show  the  d  and  n  values  and  the  vertices  in  set  S  after  each  iteration  of  the  while 
loop. 
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24.3-2 

Give  a  simple  example  of  a  directed  graph  with  negative-weight  edges  for  which 
Dijkstra’s  algorithm  produces  incorrect  answers.  Why  doesn’t  the  proof  of  Theo¬ 
rem  24.6  go  through  when  negative-weight  edges  are  allowed? 


24.3-3 

Suppose  we  change  line  4  of  Dijkstra’s  algorithm  to  the  following. 

4  while  |  Q  \  >  1 

This  change  causes  the  while  loop  to  execute  |  V\  —  1  times  instead  of  |  V\  times.  Is 
this  proposed  algorithm  correct? 


24.3-4 

Professor  Gaedel  has  written  a  program  that  he  claims  implements  Dijkstra’s  al¬ 
gorithm.  The  program  produces  v.d  and  v.n  for  each  vertex  v  €  V.  Give  an 
0(V  +  E  )-time  algorithm  to  check  the  output  of  the  professor’s  program.  It  should 
determine  whether  the  d  and  n  attributes  match  those  of  some  shortest-paths  tree. 
You  may  assume  that  all  edge  weights  are  nonnegative. 


24.3-5 

Professor  Newman  thinks  that  he  has  worked  out  a  simpler  proof  of  correctness 
for  Dijkstra’s  algorithm.  He  claims  that  Dijkstra’s  algorithm  relaxes  the  edges  of 
every  shortest  path  in  the  graph  in  the  order  in  which  they  appear  on  the  path,  and 
therefore  the  path-relaxation  property  applies  to  every  vertex  reachable  from  the 
source.  Show  that  the  professor  is  mistaken  by  constructing  a  directed  graph  for 
which  Dijkstra’s  algorithm  could  relax  the  edges  of  a  shortest  path  out  of  order. 


24.3-6 

We  are  given  a  directed  graph  G  =  (V,  E)  on  which  each  edge  (u,v)  e  E  has  an 
associated  value  r(u,  v),  which  is  a  real  number  in  the  range  0  <  r(u,  v)  <  1  that 
represents  the  reliability  of  a  communication  channel  from  vertex  u  to  vertex  v. 
We  interpret  r(u,v)  as  the  probability  that  the  channel  from  u  to  v  will  not  fail, 
and  we  assume  that  these  probabilities  are  independent.  Give  an  efficient  algorithm 
to  find  the  most  reliable  path  between  two  given  vertices. 


24.3-7 

Let  G  =  (V.  E)  be  a  weighted,  directed  graph  with  positive  weight  function 
w  :  E  — { 1 . 2, ....  IT'}  for  some  positive  integer  W,  and  assume  that  no  two  ver¬ 
tices  have  the  same  shortest-path  weights  from  source  vertex  s.  Now  suppose  that 
we  define  an  unweighted,  directed  graph  G'  =  (V  U  V' ,  E')  by  replacing  each 
edge  (m,v)  e  E  with  w(u,v)  unit-weight  edges  in  series.  How  many  vertices 
does  G'  have?  Now  suppose  that  we  run  a  breadth-first  search  on  G' .  Show  that 
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the  order  in  which  the  breadth-first  search  of  G'  colors  vertices  in  V  black  is  the 
same  as  the  order  in  which  Dijkstra’s  algorithm  extracts  the  vertices  of  V  from  the 
priority  queue  when  it  runs  on  G. 


24.3-8 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  nonnegative  weight  function 
w  :  E  — >  {0, 1, . . . ,  W  |  for  some  nonnegative  integer  W .  Modify  Dijkstra’s  algo¬ 
rithm  to  compute  the  shortest  paths  from  a  given  source  vertex  .s'  in  ()(  WV  +  E) 
time. 


24.3- 9 

Modify  your  algorithm  from  Exercise  24.3-8  to  run  in  0((V  +  E)  lg  W)  time. 
(Hint:  How  many  distinct  shortest-path  estimates  can  there  be  in  V  —  S  at  any 
point  in  time?) 

24.3- 10 

Suppose  that  we  are  given  a  weighted,  directed  graph  G  =  (V,  E)  in  which  edges 
that  leave  the  source  vertex  s  may  have  negative  weights,  all  other  edge  weights 
are  nonnegative,  and  there  are  no  negative-weight  cycles.  Argue  that  Dijkstra’s 
algorithm  correctly  finds  shortest  paths  from  s  in  this  graph. 


24.4  Difference  constraints  and  shortest  paths 

Chapter  29  studies  the  general  linear-programming  problem,  in  which  we  wish  to 
optimize  a  1  i  near  function  subject  to  a  set  of  1  i  near  inequalities.  In  this  section,  we 
investigate  a  special  case  of  1  i  near  programming  that  we  reduce  to  finding  shortest 
paths  from  a  single  source.  We  can  then  solve  the  single-source  shortest-paths 
problem  that  results  by  running  the  Bellman-Ford  algorithm,  thereby  also  solving 
the  linear-programming  problem. 

Linear  programming 

In  the  general  linear-programming  problem ,  we  are  given  an  m  x  n  matrix  A, 
an  m  -vector  b,  and  an  n  -vector  c.  We  wish  to  find  a  vector  x  of  n  elements  that 
maximizes  the  objective  function  Y^i= i  c<  x‘  subject  to  the  m  constraints  given  by 
Ax  <  b. 

Although  the  simplex  algorithm,  which  is  the  focus  of  Chapter  29,  does  not 
always  run  in  time  polynomial  in  the  size  of  its  input,  there  are  other  linear- 
programming  algorithms  that  do  run  in  polynomial  time.  We  offer  here  two  reasons 
to  understand  the  setup  of  linear-programming  problems.  First,  if  we  know  that  we 
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can  cast  a  given  problem  as  a  polynomial-sized  linear-programming  problem,  then 
we  immediately  have  a  polynomial-time  algorithm  to  solve  the  problem.  Second, 
faster  algorithms  exist  for  many  special  cases  of  linear  programming.  For  exam¬ 
ple,  the  single-pair  shortest-path  problem  (Exercise  24.4-4)  and  the  maximum-flow 
problem  (Exercise  26.1-5)  are  special  cases  of  1  i near  programming. 

Sometimes  we  don’t  really  care  about  the  objective  function;  we  just  wish  to  find 
any  feasible  solution ,  that  is,  any  vector  x  that  satisfies  Ax  <  b,  or  to  determine 
that  no  feasible  solution  exists.  We  shall  focus  on  one  such  feasibility  problem. 

Systems  of  difference  constraints 

In  a  system  of  difference  constraints ,  each  row  of  the  linear-programming  matrix  A 
contains  one  1  and  one  —1,  and  all  other  entries  of  A  are  0.  Thus,  the  constraints 
given  by  Ax  <  b  are  a  set  of  m  difference  constraints  involving  n  unknowns,  in 
which  each  constraint  is  a  simple  linear  inequality  of  the  form 

Xj  -  Xi  <  bk  , 

where  1  <  i,  j  <  n,  i  j ,  and  1  <  k  <  m. 

For  example,  consider  the  problem  of  finding  a  5-vector  x  —  (x,)  that  satisfies 

t-\ 

1 

5 
4 

W 

This  problem  is  equivalent  to  finding  values  for  the  unknowns  Xi, x2, x3, x4, x5, 
satisfying  the  following  8  difference  constraints: 


Xi  —  x2 

< 

0, 

(24.3) 

X  |  -  x5 

< 

-1 , 

(24.4) 

X2  -  Xs 

< 

1 , 

(24.5) 

x3  -  Xi 

< 

5  , 

(24.6) 

X4  —  Xi 

< 

4, 

(24.7) 

x4  -  x3 

< 

-1  , 

(24.8) 

xs  -  x3 

< 

-3, 

(24.9) 

x5  —  x4 

< 

-3  . 

(24.10) 

1 

1 

0 

-1 

-1 

0 

0 

0 


-1 

0 

1 

0 

0 

0 

0 

0 


0 

0 

0 

1 

0 

-1 

-1 

0 


0 

0 

0 

0 

1 

1 

0 

1 


-?\ 

-1 

0 

0 

0 

:/ 


lx  1  \ 

*2 
*3 

x4 

\xsJ 


< 


666 


Chapter  24  Single  Source  Shortest  Paths 


One  solution  to  this  problem  is  x  =  (—5,  —3, 0,-1,  —4),  which  you  can  verify  di¬ 
rectly  by  checking  each  inequality.  In  fact,  this  problem  has  more  than  one  solution. 
Another  is  x'  =  (0, 2,5,4, 1).  These  two  solutions  are  related:  each  component 
of  x'  is  5  larger  than  the  corresponding  component  of  x.  This  fact  is  not  mere 
coincidence. 

Lemma  24.8 

Let  x  =  (jci,  x2,  ■  ■  ■ ,  xn)  be  a  solution  to  a  system  Ax  <  b  of  difference  con¬ 
straints,  and  let  d  be  any  constant.  Then  x  +  d  =  (xi  +  d,  x2  +  d, . . . ,  xn  +  d) 
is  a  solution  to  Ax  <  b  as  well. 

Proof  For  each  x,-  and  xj,  we  have  (xj  +  d)  —  (x,-  +  d)  =  Xj  —  X;.  Thus,  if  x 
satisfies  Ax  <  b,  so  does  x  +  d.  m 

Systems  of  difference  constraints  occur  in  many  different  applications.  For  ex¬ 
ample,  the  unknowns  x,  may  be  times  at  which  events  are  to  occur.  Each  constraint 
states  that  at  least  a  certain  amount  of  time,  or  at  most  a  certain  amount  of  time, 
must  elapse  between  two  events.  Perhaps  the  events  are  jobs  to  be  performed  dur¬ 
ing  the  assembly  of  a  product.  If  we  apply  an  adhesive  that  takes  2  hours  to  set  at 
time  Xi  and  we  have  to  wait  until  it  sets  to  install  a  part  at  time  x2,  then  we  have  the 
constraint  that  x2  >  Xi  +  2  or,  equivalently,  that  Xi  —  x2  <  —2.  Alternatively,  we 
might  require  that  the  part  be  installed  after  the  adhesive  has  been  applied  but  no 
later  than  the  time  that  the  adhesive  has  set  halfway.  In  this  case,  we  get  the  pair  of 
constraints  x2  >  x,  and  x2  <  x  ,  +  1  or,  equivalently,  x ,  —  x2  <0  and  x2  —  x ,  <  1. 

Constraint  graphs 

We  can  interpret  systems  of  difference  constraints  from  a  graph-theoretic  point 
of  view.  In  a  system  Ax  <  b  of  difference  constraints,  we  view  the  m  x  n 
linear-programming  matrix  A  as  the  transpose  of  an  incidence  matrix  (see  Exer¬ 
cise  22.1-7)  for  a  graph  with  n  vertices  and  m  edges.  Each  vertex  v,  in  the  graph, 
for  i  =  1,2 corresponds  to  one  of  the  n  unknown  variables  x,.  Each  di¬ 
rected  edge  in  the  graph  corresponds  to  one  of  the  m  inequalities  involving  two 
unknowns. 

More  formally,  given  a  system  Ax  <  b  of  difference  constraints,  the  correspond¬ 
ing  constraint  graph  is  a  weighted,  directed  graph  G  =  (V.  E),  where 

V  =  {v0,vi,...,v„} 

and 

E  =  {(y,-,  Vj )  :  Xj  —  x,  <  b^  is  a  constraint} 

U  {(y0,  vi),  (y0,  y2),  (y0,  v3),  •  •  • ,  Oo,  vn)}  . 
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Figure  24.8  The  constraint  graph  corresponding  to  the  system  (24.3)  (24.10)  of  difference  con 
straints.  The  value  of  <5(i>o.  v,)  appears  in  each  vertex  v,-.  One  feasible  solution  to  the  system  is 
x  =  (-5, -3. 0,-1, -4). 

The  constraint  graph  contains  the  additional  vertex  u0,  as  vve  shall  see  shortly,  to 
guarantee  that  the  graph  has  some  vertex  which  can  reach  all  other  vertices.  Thus, 
the  vertex  set  V  consists  of  a  vertex  v,  for  each  unknown  x,,  plus  an  additional 
vertex  v0.  The  edge  set  E  contains  an  edge  for  each  difference  constraint,  plus 
an  edge  (v0,  v,)  for  each  unknown  x,.  If  x7  —  x,  <  b^  is  a  difference  constraint, 
then  the  weight  of  edge  (v,,  vj)  is  w(v,-,  vj)  =  bk.  The  weight  of  each  edge  leav¬ 
ing  Vo  is  0.  Figure  24.8  shows  the  constraint  graph  for  the  system  (24. 3) — (24. 1 0) 
of  difference  constraints. 

The  following  theorem  shows  that  we  can  find  a  solution  to  a  system  of  differ¬ 
ence  constraints  by  finding  shortest-path  weights  in  the  corresponding  constraint 
graph. 

Theorem  24.9 

Given  a  system  Ax  <  b  oi  difference  constraints,  let  G  =  (V,E)  be  the  corre¬ 
sponding  constraint  graph.  If  G  contains  no  negative-weight  cycles,  then 

x  =  (<S(v0,  ViM(v0,  v2),<S(v0,  v3),...,«5(v0,v„))  (24.11) 

is  a  feasible  solution  for  the  system.  If  G  contains  a  negative-weight  cycle,  then 
there  is  no  feasible  solution  for  the  system. 

Proof  We  first  show  that  if  the  constraint  graph  contains  no  negative-weight 
cycles,  then  equation  (24.11)  gives  a  feasible  solution.  Consider  any  edge 
(vj ,  Vy )  e  E.  By  the  triangle  inequality,  <5(v0,  vf)  <  <5(u0,  ^)  +  u)(v,-,  vy)  or, 
equivalently,  <5(v0,  vy  )  —  <5(v0,  v,)  <  u>(v,-,  vj).  Thus,  letting  x,-  =  <$(v0,  v,-)  and 
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Xj  =  S ( u„ •  Vj)  satisfies  the  difference  constraint  Xj  —  x,  <  w (v,- ,  vj)  that  corre¬ 
sponds  to  edge  (Vj ,  vj). 

Now  we  show  that  if  the  constraint  graph  contains  a  negative-weight  cycle,  then 
the  system  of  difference  constraints  has  no  feasible  solution.  Without  loss  of  gen¬ 
erality,  let  the  negative-weight  cycle  be  c  =  {14 ,  v2,  . . . ,  14),  where  V\  =  14. 
(The  vertex  14  cannot  be  on  cycle  c,  because  it  has  no  entering  edges.)  Cycle  c 


corresponds 

to  the 

following  difference  constraints: 

X2  —  Xi 

< 

w(v  1,  v2)  , 

x3  -  x2 

< 

w(v2,v3)  , 

%k— 1  Xfc—2 

< 

w(  14-2,14-!)  , 

Xk  -  Xk-1 

< 

w(vk-i,vk)  . 

We  will  assume  that  x  has  a  solution  satisfying  each  of  these  k  inequalities  and  then 
derive  a  contradiction.  The  solution  must  also  satisfy  the  inequality  that  results 
when  we  sum  the  k  inequalities  together.  If  we  sum  the  left-hand  sides,  each 
unknown  x,  is  added  in  once  and  subtracted  out  once  (remember  that  14  =  14 
implies  X\  =  Xg),  so  that  the  left-hand  side  of  the  sum  is  0.  The  right-hand  side 
sums  to  w(c),  and  thus  we  obtain  0  <  w(c).  But  since  c  is  a  negative-weight  cycle, 
w(c)  <  0,  and  we  obtain  the  contradiction  that  0  <  w(c)  <0.  ■ 

Solving  systems  of  difference  constraints 

Theorem  24.9  tells  us  that  we  can  use  the  Bellman-Ford  algorithm  to  solve  a 
system  of  difference  constraints.  Because  the  constraint  graph  contains  edges 
from  the  source  vertex  14  to  all  other  vertices,  any  negative-weight  cycle  in  the 
constraint  graph  is  reachable  from  14.  If  the  Bellman-Ford  algorithm  returns 
TRUE,  then  the  shortest-path  weights  give  a  feasible  solution  to  the  system.  In 
Figure  24.8,  for  example,  the  shortest-path  weights  provide  the  feasible  solution 
x  =  (—5,  —3, 0,  —1,  —4),  and  by  Lemma  24.8,  x  =  (d  —  5,  d  —  3,  d,  d  —  1,  d  —  4) 
is  also  a  feasible  solution  for  any  constant  d .  If  the  Bellman-Ford  algorithm  returns 
FALSE,  there  is  no  feasible  solution  to  the  system  of  difference  constraints. 

A  system  of  difference  constraints  with  m  constraints  on  n  unknowns  produces 
a  graph  with  n  +  1  vertices  and  n  +  in  edges.  Thus,  using  the  Bellman-Ford 
algorithm,  we  can  solve  the  system  in  0((n  +  1  )(n  +  m))  =  0(n2  +  nm)  time. 
Exercise  24.4-5  asks  you  to  modify  the  algorithm  to  run  in  0(nm )  time,  even  if  m 
is  much  less  than  n. 
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Exercises 


24.4-1 

Find  a  feasible  solution  or  determine  that  no  feasible  solution  exists  for  the  follow¬ 
ing  system  of  difference  constraints: 


Xi 

-  X2 

< 

1  , 

Xi 

—  x4 

< 

-4, 

x2 

-X3 

< 

2, 

X2 

-Xs 

< 

7  , 

X2 

-  X6 

< 

5  , 

X3 

-x6 

< 

10, 

x4 

-  x2 

< 

2, 

Xs 

X 1 

< 

-1  , 

Xs 

—  x4 

< 

3  , 

x6 

-  X3 

< 

-8  . 

24. 

.4-2 

Find  a  feasible  solution  or  determine  that  no  feasible  solution  exists  for  the  follow¬ 
ing  system  of  difference  constraints: 


Xi 

-  X2 

< 

4, 

Xi 

-Xs 

< 

5  , 

x2 

—  x4 

< 

-6  , 

x3 

-  X2 

< 

1  , 

x4 

- X 1 

< 

3  , 

x4 

-Xi 

< 

5  , 

x4 

-Xs 

< 

10, 

Xs 

-Xi 

< 

-4, 

Xs 

—  x4 

< 

-8  . 

24. 

.4-3 

Can  any  shortest -path  weight  from  the  new  vertex  v0  in  a  constraint  graph  be  posi¬ 
tive?  Explain. 


24.4-4 

Express  the  single-pair  shortest-path  problem  as  a  linear  program. 
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24.4-5 

Show  how  to  modify  the  Bellman-Ford  algorithm  slightly  so  that  when  we  use  it 
to  solve  a  system  of  difference  constraints  with  m  inequalities  on  n  unknowns,  the 
running  time  is  O(nm). 


24.4-6 

Suppose  that  in  addition  to  a  system  of  difference  constraints,  we  want  to  handle 
equality  constraints  of  the  form  x(-  =  xj  +  bg.  Show  how  to  adapt  the  Bellman- 
Ford  algorithm  to  solve  this  variety  of  constraint  system. 


24.4- 7 

Show  how  to  solve  a  system  of  difference  constraints  by  a  Bellman-Ford-lrke  algo¬ 
rithm  that  runs  on  a  constraint  graph  without  the  extra  vertex  v0. 

24.4- 8  * 

Let  Ax  <  b  be  a  system  of  m  difference  constraints  in  n  unknowns.  Show  that  the 
Bellman-Ford  algorithm,  when  run  on  the  corresponding  constraint  graph,  maxi¬ 
mizes  YHi=  i  xi  subject  to  Ax  <  b  and  x,-  <  0  for  all  x,. 

24.4- 9  * 

Show  that  the  Bellman-Ford  algorithm,  when  run  on  the  constraint  graph  for  a  sys¬ 
tem  Ax  <  b  of  difference  constraints,  minimizes  the  quantity  (max  (x, }— min  {x, }) 
subject  to  Ax  <  b.  Explain  how  this  fact  might  come  in  handy  if  the  algorithm  is 
used  to  schedule  construction  jobs. 

24.4- 10 

Suppose  that  every  row  in  the  matrix  A  of  a  linear  program  Ax  <  b  corresponds  to 
a  difference  constraint,  a  single-variable  constraint  of  the  form  x,-  <  bg,  or  a  single¬ 
variable  constraint  of  the  form  — x,-  <  bg.  Show  how  to  adapt  the  Bellman-Ford 
algorithm  to  solve  this  variety  of  constraint  system. 

24.4- 11 

Give  an  efficient  algorithm  to  solve  a  system  Ax  <  b  of  difference  constraints 
when  all  of  the  elements  of  b  are  real-valued  and  all  of  the  unknowns  x,  must  be 
integers. 

24.4- 12  * 

Give  an  efficient  algorithm  to  solve  a  system  Ax  <  b  of  difference  constraints 
when  all  of  the  elements  of  b  are  real-valued  and  a  specified  subset  of  some,  but 
not  necessarily  all,  of  the  unknowns  x,  must  be  integers. 
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24.5  Proofs  of  shortest-paths  properties 

Throughout  this  chapter,  our  correctness  arguments  have  relied  on  the  triangle 
inequality,  upper-bound  property,  no-path  property,  convergence  property,  path- 
relaxation  property,  and  predecessor-subgraph  property.  We  stated  these  properties 
without  proof  at  the  beginning  of  this  chapter.  In  this  section,  we  prove  them. 

The  triangle  inequality 

In  studying  breadth-first  search  (Section  22.2),  we  proved  as  Lemma  22.1  a  sim¬ 
ple  property  of  shortest  distances  in  unweighted  graphs.  The  triangle  inequality 
generalizes  the  property  to  weighted  graphs. 

Lemma  24.10  ( Triangle  inequality ) 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — M 
and  source  vertex  s.  Then,  for  all  edges  (u,  v)  e  E,  we  have 

v)  <  5(5,  u)  +  w(u,  v)  . 


Proof  Suppose  that  p  is  a  shortest  path  from  source  s  to  vertex  v.  Then  p  has 
no  more  weight  than  any  other  path  from  5  to  v.  Specifically,  path  p  has  no  more 
weight  than  the  particular  path  that  takes  a  shortest  path  from  source  s  to  vertex  u 
and  then  takes  edge  ( u ,  v). 

Exercise  24.5-3  asks  you  to  handle  the  case  in  which  there  is  no  shortest  path 
from  s  to  v.  m 

Effects  of  relaxation  on  shortest-path  estimates 

The  next  group  of  lemmas  describes  how  shortest-path  estimates  are  affected  when 
we  execute  a  sequence  of  relaxation  steps  on  the  edges  of  a  weighted,  directed 
graph  that  has  been  initialized  by  INITIALIZE-SINGLE-SOURCE. 

Lemma  24.11  (Upper-bound  property ) 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — >  R. 
Let  s  €  V  be  the  source  vertex,  and  let  the  graph  be  initialized  by  Initialize- 
Single-Source(G,  s).  Then,  v.d  >  5(5,  v)  for  all  v  e  V ,  and  this  invariant  is 
maintained  over  any  sequence  of  relaxation  steps  on  the  edges  of  G.  Moreover, 
once  v.d  achieves  its  lower  bound  5(5,  v),  it  never  changes. 


672 


Chapter  24  Single  Source  Shortest  Paths 


Proof  We  prove  the  invariant  v.d  >  S(s.  v)  for  all  vertices  v  e  F  by  induction 
over  the  number  of  relaxation  steps. 

For  the  basis,  v.d  >  S(s,  v)  is  certainly  true  after  initialization,  since  v.d  —  oo 
implies  v.d  >  S(s,  v)  for  all  v  e  V  —  {s},  and  since  s.d  =  0  >  S(s,s )  (note  that 
<$(j,  5)  =  —00  if  s  is  on  a  negative-weight  cycle  and  0  otherwise). 

For  the  inductive  step,  consider  the  relaxation  of  an  edge  (u,  v).  By  the  inductive 
hypothesis,  x.d  >  8(s,x )  for  all  x  e  V  prior  to  the  relaxation.  The  only  d  value 
that  may  change  is  v.d.  If  it  changes,  we  have 

v.d  =  u.d  +  w(u.  v) 

>  8(s,  u)  +  w(u.  v)  (by  the  inductive  hypothesis) 

>  <5(s,v)  (by  the  triangle  inequality)  , 

and  so  the  invariant  is  maintained. 

To  see  that  the  value  of  v.d  never  changes  once  v.d  =  8{s,  v),  note  that  having 
achieved  its  lower  bound,  v.d  cannot  decrease  because  we  have  just  shown  that 
v.d  >  8(s,  v),  and  it  cannot  increase  because  relaxation  steps  do  not  increase  d 
values.  ■ 

Corollary  24.12  ( No-path  property ) 

Suppose  that  in  a  weighted,  directed  graph  G  =  (V,  E)  with  weight  function 
w  :  E  — »■  R,  no  path  connects  a  source  vertex  ,s'  e  V  to  a  given  vertex  v  e  V. 
Then,  after  the  graph  is  initialized  by  Initialize-Single-Source(G,  s),  we 
have  v.d  =  8(.s,  v)  =  00,  and  this  equality  is  maintained  as  an  invariant  over 
any  sequence  of  relaxation  steps  on  the  edges  of  G. 

Proof  By  the  upper-bound  property,  we  always  have  00  =  5(s,  v)  <  v.d,  and 
thus  v.d  =  00  =  5(5,  v).  ■ 

Lemma  24.13 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — >  R, 
and  let  (m,  v)  e  E.  Then,  immediately  after  relaxing  edge  (u,v)  by  executing 
Relax(m,  v,  w),  we  have  v.d  <  u.d  +  w(u,  v). 

Proof  If,  just  prior  to  relaxing  edge  ( u ,  v),  we  have  v.d  >  u.d  +  w(u,  v),  then 
v.d  =  u.d  +  w(u,v)  afterward.  If,  instead,  v.d  <  u.d  +  w(u,v)  just  before 
the  relaxation,  then  neither  u.d  nor  v.d  changes,  and  so  vi  <  u.d  +  w(u,  v) 
afterward.  ■ 

Lemma  24.14  ( Convergence  property ) 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — *■  R, 
let  s  €  V  be  a  source  vertex,  and  let  .s-  u  —*■  v  be  a  shortest  path  in  G  for 


24.5  Proofs  of  shortest  paths  properties 


673 


some  vertices  u,v  €  V.  Suppose  that  G  is  initialized  by  Initialize-Single- 
Source(G,  5)  and  then  a  sequence  of  relaxation  steps  that  includes  the  call 
Relax(m,  v,  w)  is  executed  on  the  edges  of  G.  If  u.d  =  8(s,u)  at  any  time 
prior  to  the  call,  then  v.d  =  8(s,  v)  at  all  times  after  the  call. 

Proof  By  the  upper-bound  property,  if  u.d  =  8{s,u)  at  some  point  prior  to  re¬ 
laxing  edge  (u,  v),  then  this  equality  holds  thereafter.  In  particular,  after  relaxing 
edge  (u,  v),  we  have 

v.d  <  u.d+w(u,v )  (by  Lemma  24.13) 

=  8(s,  u )  +  w(u,  v) 

=  8(s,v)  (by  Lemma  24.1)  . 

By  the  upper-bound  property,  v.d  >  8(s\  v),  from  which  we  conclude  that 
v.d  =  8(s,  v),  and  this  equality  is  maintained  thereafter.  ■ 

Lemma  24.15  ( Path-relaxation  property ) 

Let  G  =  (V.  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — »  R, 
and  let  s  e  V  be  a  source  vertex.  Consider  any  shortest  path  p  =  (v0,  vit . . . ,  vp) 
from  5  =  v0  to  vp.  If  G  is  initialized  by  Initialize-Single-Source(G,  5)  and 
then  a  sequence  of  relaxation  steps  occurs  that  includes,  in  order,  relaxing  the  edges 
(v0>  Vi),  (iq,  v2), . . . ,  vp),  then  vp.d  =  5(5,  vp)  after  these  relaxations  and 

at  all  times  afterward.  This  property  holds  no  matter  what  other  edge  relaxations 
occur,  including  relaxations  that  are  intermixed  with  relaxations  of  the  edges  of  p. 


Proof  We  show  by  induction  that  after  the  i  th  edge  of  path  p  is  relaxed,  we  have 
Vj.d  =  8(s,  Vj ).  For  the  basis,  i  =  0,  and  before  any  edges  of  p  have  been  relaxed, 
we  have  from  the  initialization  that  v0.d  =  s.d  =  0  =  8(s,  5).  By  the  upper-bound 
property,  the  value  of  s.d  never  changes  after  initialization. 

For  the  inductive  step,  we  assume  that  Vj-i.d  =  8(s\  ),  and  we  examine 

what  happens  when  we  relax  edge  (v;_i,  v,).  By  the  convergence  property,  after 
relaxing  this  edge,  we  have  v,.d  =  8(s.  v(),  and  this  equality  is  maintained  at  all 
times  thereafter.  ■ 

Relaxation  and  shortest-paths  trees 

We  now  show  that  once  a  sequence  of  relaxations  has  caused  the  shortest-path  es¬ 
timates  to  converge  to  shortest-path  weights,  the  predecessor  subgraph  Gn  induced 
by  the  resulting  n  values  is  a  shortest-paths  tree  for  G.  We  start  with  the  follow¬ 
ing  lemma,  which  shows  that  the  predecessor  subgraph  always  forms  a  rooted  tree 
whose  root  is  the  source. 
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Lemma  24.16 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — >  R, 
let  s  €  V  be  a  source  vertex,  and  assume  that  G  contains  no  negative-weight 
cycles  that  are  reachable  from  s.  Then,  after  the  graph  is  initialized  by  Initialize  - 
Single-Source(G,  5),  the  predecessor  subgraph  Gn  forms  a  rooted  tree  with 
root  s,  and  any  sequence  of  relaxation  steps  on  edges  of  G  maintains  this  property 
as  an  invariant. 

Proof  Initially,  the  only  vertex  in  Gn  is  the  source  vertex,  and  the  lemma  is  triv¬ 
ially  true.  Consider  a  predecessor  subgraph  Gn  that  arises  after  a  sequence  of 
relaxation  steps.  We  shall  first  prove  that  Gn  is  acyclic.  Suppose  for  the  sake  of 
contradiction  that  some  relaxation  step  creates  a  cycle  in  the  graph  Gn .  Let  the  cy¬ 
cle  be  c  =  (v0>  iq, _ vk),  where  vk  =  v0.  Then,  V;.7r  =  v,_,  for  i  =  1,2 ,k 

and,  without  loss  of  generality,  we  can  assume  that  relaxing  edge  {vk-\ ,  vk)  created 
the  cycle  in  Gn. 

We  claim  that  all  vertices  on  cycle  c  are  reachable  from  the  source  s.  Why? 
Each  vertex  on  c  has  a  non-NlL  predecessor,  and  so  each  vertex  on  c  was  assigned 
a  finite  shortest-path  estimate  when  it  was  assigned  its  non-NlL  n  value.  By  the 
upper-bound  property,  each  vertex  on  cycle  c  has  a  finite  shortest-path  weight, 
which  implies  that  it  is  reachable  from  s. 

We  shall  examine  the  shortest-path  estimates  on  c  just  prior  to  the  call 
RELAX(v;t_i,  vk,  w)  and  show  that  c  is  a  negative-weight  cycle,  thereby  contra¬ 
dicting  the  assumption  that  G  contains  no  negative-weight  cycles  that  are  reachable 
from  the  source.  Just  before  the  call,  we  have  v;-.7r  =  V/_!  for  i  —  1 , 2, . . . ,  k  —  1. 
Thus,  for  i  =  1, 2, . . . ,  k  —  1,  the  last  update  to  v, .  d  was  by  the  assignment 
V[ . d  =  Vi-\.d+w(yi-i,  vf).  If  Vi-i.d changed  since  then,  it  decreased.  Therefore, 
just  before  the  call  Relax(v£-i,  vk,  w),  we  have 

V/.d  >  Vj-i.d  +  w(vj-i,  Vj)  for  all  i  =  1, 2, . . . , k  —  1  .  (24.12) 

Because  vk.n  is  changed  by  the  call,  immediately  beforehand  we  also  have  the 
strict  inequality 

Vk-d  >  vk-i.d  +  w(vk-i,Vk)  ■ 

Summing  this  strict  inequality  with  the  k  —  1  inequalities  (24.12),  we  obtain  the 
sum  of  the  shortest-path  estimates  around  cycle  c: 

k  k 

Vj.d  >  ^_,rf+W(vWlv,)) 

1=1  1=1 

k  k 

=  'Yhvi-i.d  +  'Y^w{vi-l,vi)  . 

i= 1  /=! 
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Figure  24.9  Showing  that  a  simple  path  in  Gn  from  source  s  to  vertex  v  is  unique.  If  there  are  two 
paths  p\  (s  u  x  — *■  z  v)  and  p2  (s  u  y  — ►  z  v),  where  x  y,  then  z-tt  —  x 
and  z.tr  =  y,  a  contradiction. 


But 

k  k 


=  J2v‘-i-d ' 


i=i  /■= i 


since  each  vertex  in  the  cycle  c  appears  exactly  once  in  each  summation.  This 
equality  implies 


k 


Thus,  the  sum  of  weights  around  the  cycle  c  is  negative,  which  provides  the  desired 
contradiction. 

We  have  now  proven  that  G„  is  a  directed,  acyclic  graph.  To  show  that  it  forms 
a  rooted  tree  with  root  s,  it  suffices  (see  Exercise  B.5-2)  to  prove  that  for  each 
vertex  v  €  V„,  there  is  a  unique  simple  path  from  5  to  v  in  G„. 

We  first  must  show  that  a  path  from  s  exists  for  each  vertex  in  V„.  The  ver¬ 
tices  in  V„  are  those  with  non-NIL  n  values,  plus  s.  The  idea  here  is  to  prove  by 
induction  that  a  path  exists  from  s  to  all  vertices  in  V„.  We  leave  the  details  as 
Exercise  24.5-6. 

To  complete  the  proof  of  the  lemma,  we  must  now  show  that  for  any  vertex 
v  €  V„,  the  graph  G„  contains  at  most  one  simple  path  from  5  to  v.  Suppose  other¬ 
wise.  That  is,  suppose  that,  as  Figure  24.9  illustrates,  G„  contains  two  simple  paths 
from  s  to  some  vertex  v:  p,,  which  we  decompose  into  s  ^  u  ^  x  — 
and  p2,  which  we  decompose  into  where  x  ^  y  (though  u 

could  be  s  and  z  could  be  v).  But  then,  z- n  =  x  and  z.jt  =  y,  which  implies 
the  contradiction  that  x  =  y.  We  conclude  that  G„  contains  a  unique  simple  path 
from  s  to  v,  and  thus  G„  forms  a  rooted  tree  with  root  s.  m 

We  can  now  show  that  if,  after  we  have  performed  a  sequence  of  relaxation  steps, 
all  vertices  have  been  assigned  their  true  shortest-path  weights,  then  the  predeces¬ 
sor  subgraph  G„  is  a  shortest-paths  tree. 
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Lemma  24.17  (Predecessor-subgraph  property) 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — >  M, 
let  s  €  V  be  a  source  vertex,  and  assume  that  G  contains  no  negative-weight  cycles 
that  are  reachable  from  s.  Let  us  call  Initialize-Single-Source(G,  s)  and  then 
execute  any  sequence  of  relaxation  steps  on  edges  of  G  that  produces  v.d  =  S(s,  v) 
for  all  v  e  V.  Then,  the  predecessor  subgraph  Gn  is  a  shortest-paths  tree  rooted 
at  s. 

Proof  We  must  prove  that  the  three  properties  of  shortest-paths  trees  given  on 
page  647  hold  for  Gn .  To  show  the  first  property,  we  must  show  that  VK  is  the  set 
of  vertices  reachable  from  s.  By  definition,  a  shortest-path  weight  5(5,  v)  is  finite 
if  and  only  if  v  is  reachable  from  .v,  and  thus  the  vertices  that  are  reachable  from  s 
are  exactly  those  with  finite  d  values.  But  a  vertex  v  e  V  —  { ,v J  has  been  assigned 
a  finite  value  for  v.d  if  and  only  if  v.n  f  NIL.  Thus,  the  vertices  in  V„  are  exactly 
those  reachable  from  s. 

The  second  property  follows  directly  from  Lemma  24.16. 

It  remains,  therefore,  to  prove  the  last  property  of  shortest-paths  trees:  for  each 
vertex  v  e  14,  the  unique  simple  path  .v  v  in  Gn  is  a  shortest  path  from  .v  to  v 
in  G.  Let  p  =  (v0,  iq , . . .  ,vk),  where  v0  =  s  and  vk  =  v.  For  i  =  1,2, ...  ,k, 
we  have  both  vt.d  =  8(s,  v()  and  vt.d  >  Vi-\.d  +  uj(n;_i,  v;),  from  which  we 
conclude  in(v;_i,  v;)  <  8(s,  v,)  —  8(s,  v,_i).  Summing  the  weights  along  path  p 
yields 


k 


i  =  l 

k 


<  ^(<5(5,  V/) 


vk)  -  S(s,  v0) 
8(s,  vk ) 


(because  the  sum  telescopes) 
(because  8(s,  v0)  =  8(s,s)  =  0)  . 


Thus,  w(p )  <  5(5,  vk).  Since  8(s,  vk )  is  a  lower  bound  on  the  weight  of  any  path 
from  s  to  vj,  we  conclude  that  w(p)  =  5(5,  v*),  and  thus  p  is  a  shortest  path 


from  5  to  v  =  vk. 


Exercises 


24.5-1 

Give  two  shortest-paths  trees  for  the  directed  graph  of  Figure  24.2  (on  page  648) 
other  than  the  two  shown. 
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24.5-2 

Give  an  example  of  a  weighted,  directed  graph  G  =  (V.  E)  with  weight  function 
w  :  E  — »•  M  and  source  vertex  s  such  that  G  satisfies  the  following  property:  For 
every  edge  (u.v)  €  E,  there  is  a  shortest-paths  tree  rooted  at  s  that  contains  (u,  v ) 
and  another  shortest-paths  tree  rooted  at  s  that  does  not  contain  (u,  v). 


24.5-3 

Embellish  the  proof  of  Lemma  24.10  to  handle  cases  in  which  shortest-path 
weights  are  oo  or  —  oo. 


24.5-4 

Let  G  =  {V,  E)  be  a  weighted,  directed  graph  with  source  vertex  s,  and  let  G 
be  initialized  by  Initialize-Single-Source(G,  s).  Prove  that  if  a  sequence  of 
relaxation  steps  sets  s.n  to  a  non-NlL  value,  then  G  contains  a  negative-weight 
cycle. 


24.5-5 

Let  G  =  (V.  E)  be  a  weighted,  directed  graph  with  no  negative-weight  edges.  Let 
s  €  V  be  the  source  vertex,  and  suppose  that  we  allow  v.n  to  be  the  predecessor 
of  v  on  any  shortest  path  to  v  from  source  s  if  v  e  V  —  {s}  is  reachable  from  s, 
and  NIL  otherwise.  Give  an  example  of  such  a  graph  G  and  an  assignment  of  n 
values  that  produces  a  cycle  in  Gn.  (By  Lemma  24.16,  such  an  assignment  cannot 
be  produced  by  a  sequence  of  relaxation  steps.) 


24.5-6 

Let  G  =  (V.  E)  be  a  weighted,  directed  graph  with  weight  function  w  :  E  — K. 
and  no  negative-weight  cycles.  Let  s  €  V  be  the  source  vertex,  and  let  G  be  initial¬ 
ized  by  Initialize-Single-Source(G,  5).  Prove  that  for  every  vertex  v  e  V„, 
there  exists  a  path  from  s  to  v  in  Gn  and  that  this  property  is  maintained  as  an 
invariant  over  any  sequence  of  relaxations. 


24.5-7 

Let  G  =  (V,  E)  be  a  weighted,  directed  graph  that  contains  no  negative- weight 
cycles.  Let  s  e  V  be  the  source  vertex,  and  let  G  be  initialized  by  Initialize- 
Single-Source(G,  ^).  Prove  that  there  exists  a  sequence  of  |L|  —  1  relaxation 
steps  that  produces  v.d  =  8(s.  v)  for  all  v  €  V. 


24.5-8 

Let  G  be  an  arbitrary  weighted,  directed  graph  with  a  negative-weight  cycle  reach¬ 
able  from  the  source  vertex  s.  Show  how  to  construct  an  infinite  sequence  of  relax¬ 
ations  of  the  edges  of  G  such  that  every  relaxation  causes  a  shortest-path  estimate 
to  change. 
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Problems 


24-1  Yen’s  improvement  to  Bellman-Ford 

Suppose  that  we  order  the  edge  relaxations  in  each  pass  of  the  Bellman-Ford  al¬ 
gorithm  as  follows.  Before  the  first  pass,  we  assign  an  arbitrary  linear  order 
Vi,  V2, . . . ,  v\v\  to  the  vertices  of  the  input  graph  G  =  ( V ,  E).  Then,  we  parti¬ 
tion  the  edge  set  E  into  Ef  U  Ej,,  where  Ef  =  {(v,  ,  v7  )  e  E  :  i  <  j)  and 
Eb  =  { ( v, ,  Vj)  e  E  :  i  >  j).  (Assume  that  G  contains  no  self-loops,  so  that  every 
edge  is  in  either  Ef  or  Eb.)  Define  G/  =  (F,  Ef)  and  Gb  =  (V,  Eb). 

a.  Prove  that  G/  is  acyclic  with  topological  sort  ( i> , ,  v2,  ■  ■  ■ ,  V\V\)  and  that  G/,  is 
acyclic  with  topological  sort  (v\v\,  v\v\-\.  ■  ■  ■  ,V\). 

Suppose  that  we  implement  each  pass  of  the  Bellman-Ford  algorithm  in  the  fol¬ 
lowing  way.  We  visit  each  vertex  in  the  order  Vi,  v2, .  ■  ■ ,  v\y\,  relaxing  edges  of  Ef 
that  leave  the  vertex.  We  then  visit  each  vertex  in  the  order  v\y\.  v\y\-i, . . . ,  Vi, 
relaxing  edges  of  Eb  that  leave  the  vertex. 

b.  Prove  that  with  this  scheme,  if  G  contains  no  negative-weight  cycles  that  are 
reachable  from  the  source  vertex  s,  then  after  only  |"|F|  /2]  passes  over  the 
edges,  v.d  —  <5(s,  v)  for  all  vertices  pef. 

c.  Does  this  scheme  improve  the  asymptotic  running  time  of  the  Bellman-Ford 
algorithm? 

24-2  Nesting  boxes 

A  cl  -dimensional  box  with  dimensions  (x\,x2, . . . ,  Xj )  nests  within  another  box 
with  dimensions  (y\,y2, . . .  ,yd)  if  there  exists  a  permutation  n  on  {1,2 , ,d} 
such  that  xl(i)  <  y i,  xn(2)  <  y2, . . . ,  xn(d)  <  yd. 


a.  Argue  that  the  nesting  relation  is  transitive. 

b.  Describe  an  efficient  method  to  determine  whether  or  not  one  d  -dimensional 
box  nests  inside  another. 

c.  Suppose  that  you  are  given  a  set  of  n  d -dimensional  boxes  {Bt.  B2, . . . ,  Bn}. 
Give  an  efficient  algorithm  to  find  the  longest  sequence  (Bil,  B,2 , ....  B,k )  of 
boxes  such  that  Bl/  nests  within  Bij+1  for  j  =  1,2, ...  ,k  —  1.  Express  the 
running  time  of  your  algorithm  in  terms  of  n  and  d . 


Problems  for  Chapter  24 
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24-3  Arbitrage 

Arbitrage  is  the  use  of  discrepancies  in  currency  exchange  rates  to  transform  one 
unit  of  a  currency  into  more  than  one  unit  of  the  same  currency.  For  example, 
suppose  that  1  U.S.  dollar  buys  49  Indian  rupees,  1  Indian  rupee  buys  2  Japanese 
yen,  and  1  Japanese  yen  buys  0.0107  U.S.  dollars.  Then,  by  converting  currencies, 
a  trader  can  start  with  1  U.S.  dollar  and  buy  49  x  2  x  0.0107  =  1 .0486  U.S.  dollars, 
thus  turning  a  profit  of  4.86  percent. 

Suppose  that  we  are  given  n  currencies  C\ ,  c2, ,  c„  and  an  n  x  n  table  R  of 
exchange  rates,  such  that  one  unit  of  currency  c,  buys  R[i,  j]  units  of  currency  cj. 

a.  Give  an  efficient  algorithm  to  determine  whether  or  not  there  exists  a  sequence 
of  currencies  (c(] ,  c,2, . . . ,  Cik )  such  that 


R[iiJi]  ■  R[i2,h\--- R[ik-i,ik\  ■  R[ik,i i]  >  1  ■ 

Analyze  the  running  time  of  your  algorithm. 

b.  Give  an  efficient  algorithm  to  print  out  such  a  sequence  if  one  exists.  Analyze 
the  running  time  of  your  algorithm. 

24-4  Gabow’s  scaling  algorithm  for  single-source  shortest  paths 
A  scaling  algorithm  solves  a  problem  by  initially  considering  only  the  highest- 
order  bit  of  each  relevant  input  value  (such  as  an  edge  weight).  It  then  refines  the 
initial  solution  by  looking  at  the  two  highest-order  bits.  It  progressively  looks  at 
more  and  more  high-order  bits,  refining  the  solution  each  time,  until  it  has  exam¬ 
ined  all  bits  and  computed  the  correct  solution. 

In  this  problem,  we  examine  an  algorithm  for  computing  the  shortest  paths  from 
a  single  source  by  scaling  edge  weights.  We  are  given  a  directed  graph  G  =  (V,  E) 
with  nonnegative  integer  edge  weights  w.  Let  W  =  max^  ^c/.-  {rv(u.  u)}.  Our 
goal  is  to  develop  an  algorithm  that  runs  in  0(E  lg  W)  time.  We  assume  that  all 
vertices  are  reachable  from  the  source. 

The  algorithm  uncovers  the  bits  in  the  binary  representation  of  the  edge  weights 
one  at  a  time,  from  the  most  significant  bit  to  the  least  significant  bit.  Specifically, 
let  k  =  [lg( IT  +  1)]  be  the  number  of  bits  in  the  binary  representation  of  W, 
and  for  z  =  1,2, ...  ,k,  let  w,(u,v)  =  \w(u,  v)/2k~'  J.  That  is,  wt  (w,  v)  is  the 
“scaled-down”  version  of  u?(w,  v)  given  by  the  i  most  significant  bits  of  w(u,  v). 
(Thus,  Wk(u,v)  =  w(u,v)  for  all  (u,v)  €  E.)  For  example,  if  k  =  5  and 
w(u,v)  =  25,  which  has  the  binary  representation  (11001),  then  w3(u,v)  = 
(110)  =  6.  As  another  example  with  k  =  5,  if  w(u,v )  =  (00100)  =  4,  then 
w3(iz,v)  =  (001)  =  1.  Let  us  define  <5,(u,v)  as  the  shortest-path  weight  from 
vertex  u  to  vertex  v  using  weight  function  w Thus,  Sp(u,v)  =  8(u,v)  for  all 
u,  v  e  V.  For  a  given  source  vertex  s,  the  scaling  algorithm  first  computes  the 
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shortest-path  weights  5,  (5,  v)  for  all  v  €  V ,  then  computes  S2(s,  v)  for  all  v  €  V, 
and  so  on,  until  it  computes  S/c(s,  v)  for  all  v  s  V.  We  assume  throughout  that 
\E\  >  \V\  —  1,  and  we  shall  see  that  computing  5,  from  5,_i  takes  0(E)  time,  so 
that  the  entire  algorithm  takes  0(kE)  =  0(E  lg  W)  time. 

a.  Suppose  that  for  all  vertices  v  e  V,  we  have  5(5,  v )  <  \E\.  Show  that  we  can 
compute  5(5,  u)  for  all  v  e  V  in  0(E)  time. 

b.  Show  that  we  can  compute  5i(5,  v)  for  all  v  e  V  in  0(E)  time. 

Let  us  now  focus  on  computing  5,-  from  5,_i. 

c.  Prove  that  for  i  =  2,3 we  have  either  Wi(a,v )  =  2u;,_i(m,  v)  or 
Wj(u ,  v)  =  2 Wi_i(u,  v)  +  1.  Then,  prove  that 


2Si-i(s,  v)  <  Si(s,v)  <  28i-i (s,  v)  +  \  V\  -  1 


for  all  v  €  V. 

d.  Define  for  i  =  2, 3, . . . ,  k  and  all  (u,v)  6  E , 

W{ (u ,  v)  =  w,(u,  v )  +  25,-1  (s,  u)  —  25,-x (5,  n)  . 

Prove  that  for  i  =  2,  3, . . . ,  k  and  all  u,  v  e  V,  the  “reweighted”  value  w,  (u,  v) 
of  edge  (u,  v)  is  a  nonnegative  integer. 

e.  Now,  define  5,  (5,  v)  as  the  shortest-path  weight  from  5  to  v  using  the  weight 

function  w, .  Prove  that  for  i  =2,3 . k  and  all  u  e  V, 


S,-  (5,  v)  =  S,-  (5,  v)  +  25,_i  (s,  v) 


and  that  5,  (5,  v)  <  \E\. 


f.  Show  how  to  compute  5,(5,  v)  from  5,_|  (5,  v)  for  all  v  e  V  in  0(E)  time,  and 
conclude  that  we  can  compute  5(5,  v)  for  all  v  e  V  in  0(E  lg  W)  time. 

24-5  Karp’s  minimum  mean-weight  cycle  algorithm 

Let  G  =  (V,  E)  be  a  directed  graph  with  weight  function  w  :  ^  1R ,  and  let 

n  =  |  V\.  We  define  the  mean  weight  of  a  cycle  c  =  (ci,  e2, . . . ,  eg)  of  edges  in  E 
to  be 
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Let  g*  =  minc  g(c),  where  c  ranges  over  all  directed  cycles  in  G.  We  call  a  cycle  c 
for  which  /x  (c)  =  g*  a  minimum  mean-weight  cycle.  This  problem  investigates 
an  efficient  algorithm  for  computing  g* . 

Assume  without  loss  of  generality  that  every  vertex  v  €  V  is  reachable  from  a 
source  vertex  s  €  V.  Let  S(s,  v )  be  the  weight  of  a  shortest  path  from  s  to  v,  and  let 
S/c(s,  v)  be  the  weight  of  a  shortest  path  from  .s'  to  v  consisting  of  exactly  k  edges. 
If  there  is  no  path  from  s  to  v  with  exactly  k  edges,  then  §k(s,  v)  =  oo. 

a.  Show  that  if  g*  =  0,  then  G  contains  no  negative-weight  cycles  and  8{s,  v)  = 
min0<A;<„-i  Sk(s,v)  for  all  vertices  v  e  V. 


b.  Show  that  if  g*  =  0,  then 
8n(s,v)  -Sk(s,  v) 


max 

0<k<n— 1 


>  o 


n  —  k 

for  all  vertices  v  s  V.  {Hint:  Use  both  properties  from  part  (a).) 


c.  Let  c  be  a  0-weight  cycle,  and  let  u  and  v  be  any  two  vertices  on  c.  Suppose 
that  g*  =  0  and  that  the  weight  of  the  simple  path  from  a  to  v  along  the  cycle 
is  x.  Prove  that  S(s,  v)  =  5(5,  it)  +  x.  {Hint:  The  weight  of  the  simple  path 
from  v  to  u  along  the  cycle  is  —x.) 


d.  Show  that  if  g*  =  0,  then  on  each  minimum  mean-weight  cycle  there  exists  a 
vertex  v  such  that 


5„(j,v)-4(s,i>) 
max  - 

0<k<n-l  n  —  k 


=  0  . 


{Hint:  Show  how  to  extend  a  shortest  path  to  any  vertex  on  a  minimum  mean- 
weight  cycle  along  the  cycle  to  make  a  shortest  path  to  the  next  vertex  on  the 
cycle.) 


e.  Show  that  if  g*  =  0,  then 

8n(s,v)  -  8k(s,v ) 


mm  max 

veV  0<k<n  —  l 


=  0  . 


/.  Show  that  if  we  add  a  constant  t  to  the  weight  of  each  edge  of  G,  then  g* 
increases  by  t.  Use  this  fact  to  show  that 


g  =  mm  max 

veV  0 <k<n— 


8„{s.v)  —  8k(s,  v) 


g.  Give  an  0{VE)-(\me  algorithm  to  compute  g*. 
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24-6  Bitonic  shortest  paths 

A  sequence  is  bitonic  if  it  monotonically  increases  and  then  monotonically  de¬ 
creases,  or  if  by  a  circular  shift  it  monotonically  increases  and  then  monotonically 
decreases.  For  example  the  sequences  (1,  4,  6,  8,  3,  —2),  (9, 2,  —4,  —10,  —5),  and 
(1, 2,  3,  4)  are  bitonic,  but  (1,  3,  12,  4, 2,  10)  is  not  bitonic.  (See  Problem  15-3  for 
the  bi tonic  euclidean  traveling-salesman  problem.) 

Suppose  that  we  are  given  a  directed  graph  G  =  (V,  E)  with  weight  function 
w  :  E  — R,  where  all  edge  weights  are  unique,  and  we  wish  to  find  single-source 
shortest  paths  from  a  source  vertex  s.  We  are  given  one  additional  piece  of  infor¬ 
mation:  for  each  vertex  v  €  V,  the  weights  of  the  edges  along  any  shortest  path 
from  s  to  v  form  a  bitonic  sequence. 

Give  the  most  efficient  algorithm  you  can  to  solve  this  problem,  and  analyze  its 
running  time. 


Chapter  notes 

Dijkstra’s  algorithm  [88]  appeared  in  1959,  but  it  contained  no  mention  of  apriority 
queue.  The  Bellman-Ford  algorithm  is  based  on  separate  algorithms  by  Bellman 
[38]  and  Ford  [109].  Bellman  describes  the  relation  of  shortest  paths  to  difference 
constraints.  Lawler  [224]  describes  the  linear-time  algorithm  for  shortest  paths  in 
a  dag,  which  he  considers  part  of  the  folklore. 

When  edge  weights  are  relatively  small  nonnegative  integers,  we  have  more  ef¬ 
ficient  algorithms  to  solve  the  single-source  shortest-paths  problem.  The  sequence 
of  values  returned  by  the  Extract-Min  calls  in  Dijkstra’s  algorithm  monoton¬ 
ically  increases  over  time.  As  discussed  in  the  chapter  notes  for  Chapter  6,  in 
this  case  several  data  structures  can  implement  the  various  priority-queue  opera¬ 
tions  more  efficiently  than  a  binary  heap  or  a  Fibonacci  heap.  Ahuja,  Mehlhorn, 
Orlin,  and  Tarjan  [8]  give  an  algorithm  that  runs  in  0(E  +  V  i/IgW)  time  on 
graphs  with  nonnegative  edge  weights,  where  W  is  the  largest  weight  of  any  edge 
in  the  graph.  The  best  bounds  are  by  Thorup  [337],  who  gives  an  algorithm  that 
runs  in  0(E  lg  lg  V)  time,  and  by  Raman  [291],  who  gives  an  algorithm  that  runs 
in  O  (E  +  V  min  {(lg  V)l^+€ ,  (lg  lF)1//4+e})  time.  These  two  algorithms  use  an 
amount  of  space  that  depends  on  the  word  size  of  the  underlying  machine.  Al¬ 
though  the  amount  of  space  used  can  be  unbounded  in  the  size  of  the  input,  it  can 
be  reduced  to  be  1  i  near  in  the  size  of  the  input  using  randomized  hashing. 

For  undirected  graphs  with  integer  weights,  Thorup  [336]  gives  an  0(V  +  E )- 
time  algorithm  for  single-source  shortest  paths.  In  contrast  to  the  algorithms  men¬ 
tioned  in  the  previous  paragraph,  this  algorithm  is  not  an  implementation  of  Dijk- 
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stra’s  algorithm,  since  the  sequence  of  values  returned  by  Extract-Min  calls 
does  not  monotonically  increase  over  time. 

For  graphs  with  negative  edge  weights,  an  algorithm  due  to  Gabow  and  Tar- 
jan  [122]  runs  in  0(\[~V E  \g(VW))  time,  and  one  by  Goldberg  [137]  runs  in 
0(y/V E  lg  W)  time,  where  W  =  max(U;V)e£  {\w(u,  v)|}. 

Cherkassky,  Goldberg,  and  Radzik  [64]  conducted  extensive  experiments  com¬ 
paring  various  shortest-path  algorithms. 


25 


All-Pairs  Shortest  Paths 


In  this  chapter,  we  consider  the  problem  of  finding  shortest  paths  between  all  pairs 
of  vertices  in  a  graph.  This  problem  might  arise  in  making  a  table  of  distances  be¬ 
tween  all  pairs  of  cities  for  a  road  atlas.  As  in  Chapter  24,  we  are  given  a  weighted, 
directed  graph  G  =  (V,  E)  with  a  weight  function  w  :  E  — >•  M  that  maps  edges 
to  real-valued  weights.  We  wish  to  find,  for  every  pair  of  vertices  u,v  €  V,  a 
shortest  (least-weight)  path  from  u  to  v,  where  the  weight  of  a  path  is  the  sum  of 
the  weights  of  its  constituent  edges.  We  typically  want  the  output  in  tabular  form: 
the  entry  in  u ’s  row  and  v ’s  column  should  be  the  weight  of  a  shortest  path  from  u 
to  v. 

We  can  solve  an  all-pairs  shortest-paths  problem  by  running  a  single-source 
shortest-paths  algorithm  \V\  times,  once  for  each  vertex  as  the  source.  If  all 
edge  weights  are  nonnegative,  we  can  use  Dijkstra’s  algorithm.  If  we  use 
the  linear-array  implementation  of  the  min-priority  queue,  the  running  time  is 
0(V3  +  VE)  =  G(l/3).  The  binary  min-heap  implementation  of  the  min-priority 
queue  yields  a  running  time  of  0(VE  Ig  F),  which  is  an  improvement  if  the  graph 
is  sparse.  Alternatively,  we  can  implement  the  min-priority  queue  with  a  Fibonacci 
heap,  yielding  a  running  time  of  0(  V2  lg  V  +  VE). 

If  the  graph  has  negative-weight  edges,  we  cannot  use  Dijkstra’s  algorithm.  In¬ 
stead,  we  must  run  the  slower  Bellman-Ford  algorithm  once  from  each  vertex.  The 
resulting  running  time  is  0(V2E),  which  on  a  dense  graph  is  0(V4).  In  this  chap¬ 
ter  we  shall  see  how  to  do  better.  We  also  investigate  the  relation  of  the  all-pairs 
shortest-paths  problem  to  matrix  multiplication  and  study  its  algebraic  structure. 

Unlike  the  single-source  algorithms,  which  assume  an  adjacency-list  represen¬ 
tation  of  the  graph,  most  of  the  algorithms  in  this  chapter  use  an  adjacency- 
matrix  representation.  (Johnson’s  algorithm  for  sparse  graphs,  in  Section  25.3, 
uses  adjacency  lists.)  For  convenience,  we  assume  that  the  vertices  are  numbered 
1, 2, ....  |  V\,  so  that  the  input  is  an  n  x  n  matrix  W  representing  the  edge  weights 
of  an  n -vertex  directed  graph  G  =  (V.  E).  That  is,  W  =  (u;;/),  where 
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Wij 


0  if  i  =  j  , 

the  weight  of  directed  edge  (i ,  j )  if  z  /  j  and  (i,  j)  e  E  , 
oo  if  i  ^  j  and  (i,  j)  $  E  . 


(25.1) 


We  allow  negative-weight  edges,  but  we  assume  for  the  time  being  that  the  input 
graph  contains  no  negative-weight  cycles. 

The  tabular  output  of  the  all-pairs  shortest-paths  algorithms  presented  in  this 
chapter  is  an  n  x  n  matrix  D  =  (djj),  where  entry  dij  contains  the  weight  of  a 
shortest  path  from  vertex  i  to  vertex  j .  That  is,  if  we  let  S ( i ,  j )  denote  the  shortest- 
path  weight  from  vertex  i  to  vertex  j  (as  in  Chapter  24),  then  c/;/  =  8(i,j )  at 
termination. 

To  solve  the  all-pairs  shortest-paths  problem  on  an  input  adjacency  matrix,  we 
need  to  compute  not  only  the  shortest-path  weights  but  also  a  predecessor  matrix 
IT  =  (jijj),  where  7r,7-  is  NIL  if  either  i  =  j  or  there  is  no  path  from  i  to  j , 
and  otherwise  ntj  is  the  predecessor  of  j  on  some  shortest  path  from  i.  Just  as 
the  predecessor  subgraph  Gn  from  Chapter  24  is  a  shortest-paths  tree  for  a  given 
source  vertex,  the  subgraph  induced  by  the  /  th  row  of  the  n  matrix  should  be  a 
shortest-paths  tree  with  root  i.  For  each  vertex  i  e  V,  we  define  the  predecessor 
subgraph  of  G  for  i  as  G^j  =  ( FT,; .  En  i)  ,  where 


Vn,i  =  {j  6  V  :  TTij  ±  NIL}  U  {/} 


and 


—  {(jtij,  j)  .  j  Cl  Vnj  {f }}  . 

If  Gjrj  is  a  shortest-paths  tree,  then  the  following  procedure,  which  is  a  modified 
version  of  the  Print- Path  procedure  from  Chapter  22,  prints  a  shortest  path  from 
vertex  i  to  vertex  j . 


Print- All-Pairs-Shortest-Path (IT,  i,  j ) 

1  if  i  ==  j 

2  print  i 

3  elseif  7t,7  ==  nil 

4  print  “no  path  from”  i  “to”  j  “exists” 

5  else  Print-All-Pairs-Shortest-Path(I1,  i,  n,, ) 

6  print  j 

In  order  to  highlight  the  essential  features  of  the  all-pairs  algorithms  in  this  chapter, 
we  won’t  cover  the  creation  and  properties  of  predecessor  matrices  as  extensively 
as  we  dealt  with  predecessor  subgraphs  in  Chapter  24.  Some  of  the  exercises  cover 
the  basics. 
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Chapter  outline 

Section  25. 1  presents  a  dynamic-programming  algorithm  based  on  matrix  multi¬ 
plication  to  solve  the  all-pairs  shortest-paths  problem.  Using  the  technique  of  “re¬ 
peated  squaring,”  we  can  achieve  a  running  time  of  0(F3  lg  V).  Section  25.2  gives 
another  dynamic-programming  algorithm,  the  Floyd-Warshall  algorithm,  which 
runs  in  time  0(F3).  Section  25.2  also  covers  the  problem  of  finding  the  tran¬ 
sitive  closure  of  a  directed  graph,  which  is  related  to  the  all-pairs  shortest-paths 
problem.  Finally,  Section  25.3  presents  Johnson’s  algorithm,  which  solves  the  all¬ 
pairs  shortest-paths  problem  in  0(V2  lg  V  +  VE)  time  and  is  a  good  choice  for 
large,  sparse  graphs. 

Before  proceeding,  we  need  to  establish  some  conventions  for  adjacency-matrix 
representations.  First,  we  shall  generally  assume  that  the  input  graph  G  =  (V,  E ) 
has  n  vertices,  so  that  n  =  \V\.  Second,  we  shall  use  the  convention  of  denoting 
matrices  by  uppercase  letters,  such  as  W,  L,  or  D,  and  their  individual  elements 
by  subscripted  lowercase  letters,  such  as  u;,y,  Uj ,  or  c/,/ .  Some  matrices  will  have 
parenthesized  superscripts,  as  in  Lim)  =  (ly'))  or  D(m}  =  (c//™*),  to  indicate 
iterates.  Finally,  for  a  given  n  x  n  matrix  A,  we  shall  assume  that  the  value  of  n  is 
stored  in  the  attribute  A .  rows. 


25.1  Shortest  paths  and  matrix  multiplication 

This  section  presents  a  dynamic -programming  algorithm  for  the  all-pairs  shortest- 
paths  problem  on  a  directed  graph  G  =  (V,  E).  Each  major  loop  of  the  dynamic 
program  will  invoke  an  operation  that  is  very  similar  to  matrix  multiplication,  so 
that  the  algorithm  will  look  like  repeated  matrix  multiplication.  We  shall  staid  by 
developing  a  @(F4)-time  algorithm  for  the  all-pairs  shortest-paths  problem  and 
then  improve  its  running  time  to  0(F3  lg  V). 

Before  proceeding,  let  us  briefly  recap  the  steps  given  in  Chapter  15  for  devel¬ 
oping  a  dynamic-programming  algorithm. 

1 .  Characterize  the  structure  of  an  optimal  solution. 

2.  Recursively  define  the  value  of  an  optimal  solution. 

3.  Compute  the  value  of  an  optimal  solution  in  a  bottom-up  fashion. 

We  reserve  the  fourth  step— constructing  an  optimal  solution  from  computed  in¬ 
formation— for  the  exercises. 
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The  structure  of  a  shortest  path 

We  start  by  characterizing  the  structure  of  an  optimal  solution.  For  the  all-pairs 
shortest-paths  problem  on  a  graph  G  =  (V,  E),  we  have  proven  (Lemma  24.1) 
that  all  subpaths  of  a  shortest  path  are  shortest  paths.  Suppose  that  we  represent 
the  graph  by  an  adjacency  matrix  W  =  (w;y).  Consider  a  shortest  path  p  from 
vertex  i  to  vertex  j ,  and  suppose  that  p  contains  at  most  m  edges.  Assuming  that 
there  are  no  negative-weight  cycles,  m  is  finite.  If  i  =  j ,  then  p  has  weight  0 
and  no  edges.  If  vertices  i  and  j  are  distinct,  then  we  decompose  path  p  into 

i  k  j ,  where  path  p'  now  contains  at  most  m  —  1  edges.  By  Lemma  24.1, 
p'  is  a  shortest  path  from  i  to  k,  and  so  8(i,  j)  =  8(i,  k )  +  Wpj- 

A  recursive  solution  to  the  all-pairs  shortest-paths  problem 

Now,  let  be  the  minimum  weight  of  any  path  from  vertex  i  to  vertex  j  that 
contains  at  most  m  edges.  When  m  —  0,  there  is  a  shortest  path  from  i  to  j  with 
no  edges  if  and  only  if  i  =  j .  Thus, 

j«»  _  J  0  if*  =7. 
lJ  |  oo  if  /'  /  y  . 

For  m  >  1,  we  compute  1  as  the  minimum  of  (the  weight  of  a  shortest 

path  from  i  to  j  consisting  of  at  most  in  —  I  edges)  and  the  minimum  weight  of  any 
path  from  i  to  j  consisting  of  at  most  in  edges,  obtained  by  looking  at  all  possible 
predecessors  A;  of  j .  Thus,  we  recursively  define 

lif  =  min  (/ym_1)’  |n}m)(  {lik  "1}  +  wkj}) 

=  n{l\k~l)  +  wkj)  ■  (25-2) 

The  latter  equality  follows  since  Wjj  =0  for  all  j . 

What  are  the  actual  shortest-path  weights  8(i,  j)?  If  the  graph  contains 
no  negative-weight  cycles,  then  for  every  pair  of  vertices  i  and  j  for  which 
8{i,  j)  <  oo,  there  is  a  shortest  path  from  i  to  j  that  is  simple  and  thus  contains  at 
most  n  —  1  edges.  A  path  from  vertex  i  to  vertex  j  with  more  than  n  —  1  edges 
cannot  have  lower  weight  than  a  shortest  path  from  i  to  j .  The  actual  shortest-path 
weights  are  therefore  given  by 

=  =  1™  =  tf+i)  =  ■■■  . 


(25.3) 
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Computing  the  shortest-path  weights  bottom  up 

Taking  as  our  input  the  matrix  W  =  (w/j),  we  now  compute  a  series  of  matrices 
Z>('),  L(2K  ....  L(n~v>,  where  for  m  =  1, 2, . . . ,  n  —  1,  we  have 
The  final  matrix  contains  the  actual  shortest-path  weights.  Observe  that 

ij-p  =  Wij  for  all  vertices  i,  j  e  V,  and  so  L(l>  =  W. 

The  heart  of  the  algorithm  is  the  following  procedure,  which,  given  matrices 
and  W,  returns  the  matrix  L<m>.  That  is,  it  extends  the  shortest  paths  com¬ 
puted  so  far  by  one  more  edge. 

Extend-Shortest-Paths(L,  W) 

1  n  =  L.rows 

2  let  L'  =  (I', )  be  a  new  n  x  n  matrix 

3  for  i  =  1  to  n 

4  for  j  =  1  to  n 

5  l'..  =  oo 

6  for  k  =  1  to  n 

7  ru  =  min(/f ,  lik  +  wkj) 

8  return  L' 


The  procedure  computes  a  matrix  L '  =  (/f ),  which  it  returns  at  the  end.  It  does  so 
by  computing  equation  (25.2)  for  all  i  and  j ,  using  L  for  and  L'  for  L(mK 

(It  is  written  without  the  superscripts  to  make  its  input  and  output  matrices  inde¬ 
pendent  of  m.)  Its  running  time  is  0(/73)  due  to  the  three  nested  for  loops. 

Now  we  can  see  the  relation  to  matrix  multiplication.  Suppose  we  wish  to  com¬ 
pute  the  matrix  product  C  =  A  ■  B  of  two  n  x  n  matrices  A  and  B.  Then,  for 
i,j  —  1,2, ... ,/!,  we  compute 


cij  =  Yhaik-bkj  .  (25.4) 

k=  1 

Observe  that  if  we  make  the  substitutions 


j  (m  — 1)  _ 

*  a 

w  - 

*  b 

/(m)  - 

*  c  . 

min  - 

*  + 

+  - 

in  equation  (25.2),  we  obtain  equation  (25.4).  Thus,  if  we  make  these  changes  to 
Extend-Shortest-Paths  and  also  replace  oo  (the  identity  for  min)  by  0  (the 
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identity  for  +),  we  obtain  the  same  @(«3)-time  procedure  for  multiplying  square 
matrices  that  we  saw  in  Section  4.2: 

Square-Matrix-MultiplyG4,  B) 

1  n  =  A.  rows 

2  let  C  be  a  new  n  x  n  matrix 

3  for  i  =  1  to  n 

4  for  j  =  1  to  n 

5  dj  =  0 

6  for  k  =  1  to  n 

2  Cjj  —  C[j  +  U/ic  •  bkj 

8  return  C 

Returning  to  the  all-pairs  shortest-paths  problem,  we  compute  the  shortest-path 
weights  by  extending  shortest  paths  edge  by  edge.  Letting  A  ■  B  denote  the  ma¬ 
trix  “product”  returned  by  Extend-Shortest- Paths  (A,  B),  we  compute  the  se¬ 
quence  of  n  —  1  matrices 


L<»  = 

L(°). 

fL 

=  IP, 

L(2)  = 

L(1)  ■ 

IT 

=  IP2  , 

l(3)  = 

L(2)  • 

IE 

=  W3  , 

L(«-2) 

■  IP 

=  IP"- 

As  we  argued  above,  the  matrix  L(n  =  Wn  1  contains  the  shortest-path  weights. 
The  following  procedure  computes  this  sequence  in  0 (// 4 )  time. 

Slow-  All-Pairs-Shortest-  Paths  (W) 

1  n  =  W.  rows 

2  L(1)  =  W 

3  for  m  =  2  to  n  —  1 

4  let  be  a  new  n  x  n  matrix 

5  L(m)  =  EXTEND-SHORTEST-PATHS(L(m_1),  W) 

6  return  L(n-1) 

Figure  25.1  shows  a  graph  and  the  matrices  L(m)  computed  by  the  procedure 
Slow-  All-Pairs-Shortest-  Paths. 

Improving  the  running  time 

Our  goal,  however,  is  not  to  compute  all  the  L(m)  matrices:  we  are  interested 
only  in  matrix  L(n~V) .  Recall  that  in  the  absence  of  negative-weight  cycles,  equa- 
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Figure  25.1  A  directed  graph  and  the  sequence  of  matrices  computed  by  Slow  All  Pairs 
Shortest  Paths.  You  might  want  to  verify  that  Z/s\  defined  as  ■  W,  equals  L^4\  and  thus 

L(m)  =  l(4)  f()r  all  m  >  4 


tion  (25.3)  implies  Z/m)  =  L(n-1)  for  all  integers  m  >  n  —  1.  Just  as  tradi¬ 
tional  matrix  multiplication  is  associative,  so  is  matrix  multiplication  defined  by 
the  EXTEND-SHORTEST-PATHS  procedure  (see  Exercise  25.1-4).  Therefore,  we 
can  compute  L(n_1)  with  only  [lg(/t  —  1)]  matrix  products  by  computing  the  se¬ 
quence 


L(1)  = 

W , 

L®  = 

w2 

=  IT-  IT  , 

l(4)  = 

w 4 

=  VT2  •  W2 

l<8>  = 

JT8 

=  w4  ■  w4 , 

II 

7 

- 

<s 

^4 

.  1)1-1 


Since  >  n  —  1,  the  final  product  L(2n,<"  U1)  is  equal  to  L(n_1). 

The  following  procedure  computes  the  above  sequence  of  matrices  by  using  this 
technique  of  repeated  squaring. 
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Figure  25.2  A  weighted,  directed  graph  for  use  in  Exercises  25.1  1,  25.2  1,  and  25.3  1. 


Faster-All-Pairs-Shortest-Paths(IT) 

1  n  =  W.rows 

2  L(,)  =  W 

3  m  =  1 

4  while  m  <  n  —  1 

5  let  L (2m)  be  a  new  n  x  n  matrix 

6  L(2m)  =  EXTEND-SHORTEST-PATHS(L<m),L(m)) 

7  m  =  2m 

8  return  L(m) 

In  each  iteration  of  the  while  loop  of  lines  4-7,  we  compute  L(2m)  =  (L(m))2, 
starting  with  m  —  1.  At  the  end  of  each  iteration,  we  double  the  value 
of  m.  The  final  iteration  computes  L(n~l)  by  actually  computing  Li2m)  for  some 
n  —  1  <  2m  <2n  —2.  By  equation  (25.3),  L(2m)  =  The  next  time  the  test 

in  line  4  is  performed,  m  has  been  doubled,  so  now  m  >  n  —  1,  the  test  fails,  and 
the  procedure  returns  the  last  matrix  it  computed. 

Because  each  of  the  [lg(/t  —  1)]  matrix  products  takes  0(n3)  time,  FASTER- 
All-Pairs-Shortest-Paths  runs  in  @(n3lgn)  time.  Observe  that  the  code 
is  tight,  containing  no  elaborate  data  structures,  and  the  constant  hidden  in  the 
©-notation  is  therefore  small. 

Exercises 


25.1-1 

Run  Slow-All-Pairs-Shortest-Paths  on  the  weighted,  directed  graph  of 
Figure  25.2,  showing  the  matrices  that  result  for  each  iteration  of  the  loop.  Then 
do  the  same  for  Faster- All-Pairs- Shortest- Paths. 


25.1-2 

Why  do  we  require  that  wa  —  0  for  all  1  <  i  <  nl 
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25.1-3 


What  does  the  matrix 

/  0 

oo 

oo 

oo 

oo 

0 

oo 

oo 

L(0)  = 

oo 

oo 

0  ■■■ 

oo 

oo 

oo 

oo  ■  ■  ■ 

0 

used  in  the  shortest-paths  algorithms  correspond  to  in  regular  matrix  multiplica¬ 
tion? 

25.1-4 

Show  that  matrix  multiplication  defined  by  Extend-Shortest-Paths  is  asso¬ 
ciative. 


25.1- 5 

Show  how  to  express  the  single-source  shortest-paths  problem  as  a  product  of  ma¬ 
trices  and  a  vector.  Describe  how  evaluating  this  product  corresponds  to  a  Bellman- 
Ford-like  algorithm  (see  Section  24.1). 

25.1- 6 

Suppose  we  also  wish  to  compute  the  vertices  on  shortest  paths  in  the  algorithms  of 
this  section.  Show  how  to  compute  the  predecessor  matrix  IT  from  the  completed 
matrix  L  of  shortest-path  weights  in  0(n3)  time. 


25.1-7 

We  can  also  compute  the  vertices  on  shortest  paths  as  we  compute  the  shortest- 
path  weights.  Define  tt)"'  ’  as  the  predecessor  of  vertex  j  on  any  minimum- weight 
path  from  i  to  j  that  contains  at  most  m  edges.  Modify  the  Extend-Shortest- 
Paths  and  Slow-All-Pairs-Shortest-Paths  procedures  to  compute  the  ma¬ 
trices  n(1),  n(2), ....  as  the  matrices  L(1\  L(2), . . . ,  L(n-1)  are  computed. 


25.1-8 

The  Faster-All-Pairs-Shortest-Paths  procedure,  as  written,  requires  us  to 
store  |"lg(n  —  1)1  matrices,  each  with  n2  elements,  for  a  total  space  requirement  of 
®(n2  lg  «)■  Modify  the  procedure  to  require  only  0 (/? 2 )  space  by  using  only  two 
n  x  n  matrices. 


25.1-9 

Modify  Faster-All-Pairs-Shortest-Paths  so  that  it  can  determine  whether 
the  graph  contains  a  negative-weight  cycle. 
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25.1-10 

Give  an  efficient  algorithm  to  find  the  length  (number  of  edges)  of  a  minimum- 
length  negative-weight  cycle  in  a  graph. 


25.2  The  Floyd- Warshall  algorithm 

In  this  section,  we  shall  use  a  different  dynamic-programming  formulation  to  solve 
the  all -pairs  shortest-paths  problem  on  a  directed  graph  G  =  (V.  E).  The  result¬ 
ing  algorithm,  known  as  the  Floyd -War shall  algorithm ,  runs  in  0(1/3)  time.  As 
before,  negative-weight  edges  may  be  present,  but  we  assume  that  there  are  no 
negative-weight  cycles.  As  in  Section  25.1,  we  follow  the  dynamic -programming 
process  to  develop  the  algorithm.  After  studying  the  resulting  algorithm,  we 
present  a  similar  method  for  finding  the  transitive  closure  of  a  directed  graph. 

The  structure  of  a  shortest  path 

In  the  Floyd-Warshall  algorithm,  we  characterize  the  structure  of  a  shortest  path 
differently  from  how  we  characterized  it  in  Section  25.1.  The  Floyd-Warshall  algo¬ 
rithm  considers  the  intermediate  vertices  of  a  shortest  path,  where  an  intermediate 
vertex  of  a  simple  path  p  =  (vx,  v2, . . . ,  V/)  is  any  vertex  of  p  other  than  V\  or  v/, 
that  is,  any  vertex  in  the  set  {v2,  v3, ... ,  v/_i}. 

The  Floyd-Warshall  algorithm  relies  on  the  following  observation.  Under  our 

assumption  that  the  vertices  of  G  are  V  =  {1,2 . /?},  let  us  consider  a  subset 

{1,2 . k  |  of  vertices  for  some  k.  For  any  pair  of  vertices  i,  j  €  V,  consider  all 

paths  from  i  to  j  whose  intermediate  vertices  are  all  drawn  from  {1,2 , ...  ,k},  and 
let  p  be  a  minimum-weight  path  from  among  them.  (Path  p  is  simple.)  The  Floyd- 
Warshall  algorithm  exploits  a  relationship  between  path  p  and  shortest  paths  from  i 
to  j  with  all  intermediate  vertices  in  the  set  {1,2, ...  ,k  —  1}.  The  relationship 
depends  on  whether  or  not  k  is  an  intermediate  vertex  of  path  p. 

•  If  k  is  not  an  intermediate  vertex  of  path  p,  then  all  intermediate  vertices  of 
path  p  are  in  the  set  {1,2 ,...  ,k  —  1}.  Thus,  a  shortest  path  from  vertex  i 
to  vertex  j  with  all  intermediate  vertices  in  the  set  {1, 2, . . . ,  k  —  1}  is  also  a 
shortest  path  from  i  to  j  with  all  intermediate  vertices  in  the  set  {1, 2, . . . ,  k}. 

•  If  k  is  an  intermediate  vertex  of  path  p,  then  we  decompose  p  into  i  Q  k  &  j , 
as  Figure  25.3  illustrates.  By  Lemma  24.1,  pi  is  a  shortest  path  from  i  to  k 
with  all  intermediate  vertices  in  the  set  {1, 2, ... ,  k}.  In  fact,  we  can  make  a 
slightly  stronger  statement.  Because  vertex  k  is  not  an  intermediate  vertex  of 
path  pi ,  all  intermediate  vertices  of  p\  are  in  the  set  { 1 , 2, ....  A:  —  I  }.  There- 
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Figure  25.3  Path  p  is  a  shortest  path  from  vertex  i  to  vertex  j ,  and  k  is  the  highest  numbered 
intermediate  vertex  of  p.  Path  p\,  the  portion  of  path  p  from  vertex  i  to  vertex  k,  has  all  intermediate 
vertices  in  the  set  {1,  2. ....  k  —  1}.  The  same  holds  for  path  p2  from  vertex  k  to  vertex  j . 

fore,  pi  is  a  shortest  path  from  i  to  k  with  all  intermediate  vertices  in  the  set 
{1,2,... ,  A:  —  1}.  Similarly,  p2  is  a  shortest  path  from  vertex  k  to  vertex  j  with 
all  intermediate  vertices  in  the  set  {1, 2, . . . ,  k  —  1}. 


A  recursive  solution  to  the  all-pairs  shortest-paths  problem 

Based  on  the  above  observations,  we  define  a  recursive  formulation  of  shortest- 
path  estimates  that  differs  from  the  one  in  Section  25.1.  Let  dp 1  be  the  weight 
of  a  shortest  path  from  vertex  i  to  vertex  j  for  which  all  intermediate  vertices 
are  in  the  set  {1, 2, . . . ,  k}.  When  k  =  0,  a  path  from  vertex  i  to  vertex  j  with 
no  intermediate  vertex  numbered  higher  than  0  has  no  intermediate  vertices  at  all. 
Such  a  path  has  at  most  one  edge,  and  hence  djp  =  w;;y .  Following  the  above 
discussion,  we  define  dp  ’  recursively  by 

d{k)  =  \WiJ  if  *  =  0  , 

iJ  j  mm  +  d§-»)  if*  >  1  •  (  } 

Because  for  any  path,  all  intermediate  vertices  are  in  the  set  {1, 2, . . . ,  n},  the  ma¬ 
trix  D(n)  =  (dp)  gives  the  final  answer:  d-'l)  =  S(i,  j)  for  all  i,  j  e  V . 


Computing  the  shortest-path  weights  bottom  up 

Based  on  recurrence  (25.5),  we  can  use  the  following  bottom-up  procedure  to  com¬ 
pute  the  values  d-p  in  order  of  increasing  values  of  k.  Its  input  is  an  nxn  matrix  W 
defined  as  in  equation  (25.1).  The  procedure  returns  the  matrix  D{n)  of  shortest- 
path  weights. 


25.2  The  Floyd  Warshall  algorithm 
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Floyd-Warshall  (W) 

1  n  —  W.  rows 

2  Dw  =  W 

3  for  k  —  1  to  n 

4  let  D(k)  =  (d-p)  be  a  new  n  x  n  matrix 

5  for  i  =  1  to  n 

6  for  j  =  1  to  n 

7  4>  =  mm(4-1),4-1)  +  4fc-1)) 

8  return  D<n> 

Figure  25.4  shows  the  matrices  D(k)  computed  by  the  Floyd-Warshall  algorithm 
for  the  graph  in  Figure  25.1. 

The  running  time  of  the  Floyd-Warshall  algorithm  is  determined  by  the  triply 
nested  for  loops  of  lines  3-7.  Because  each  execution  of  line  7  takes  0(1)  time, 
the  algorithm  runs  in  time  @(«3).  As  in  the  final  algorithm  in  Section  25.1,  the 
code  is  tight,  with  no  elaborate  data  structures,  and  so  the  constant  hidden  in  the 
©-notation  is  small.  Thus,  the  Floyd-Warshall  algorithm  is  quite  practical  for  even 
moderate-sized  input  graphs. 

Constructing  a  shortest  path 

There  are  a  variety  of  different  methods  for  constructing  shortest  paths  in  the  Floyd- 
Warshall  algorithm.  One  way  is  to  compute  the  matrix  D  of  shortest-path  weights 
and  then  construct  the  predecessor  matrix  FI  from  the  D  matrix.  Exercise  25.1-6 
asks  you  to  implement  this  method  so  that  it  runs  in  0(n3)  time.  Given  the  pre¬ 
decessor  matrix  fl,  the  Print-All-Pairs-Shortest-Path  procedure  will  print 
the  vertices  on  a  given  shortest  path. 

Alternatively,  we  can  compute  the  predecessor  matrix  fl  while  the  algorithm 
computes  the  matrices  D(k) .  Specifically,  we  compute  a  sequence  of  matrices 
n(°)  n(1), . . . ,  IT"*,  where  fl  =  IT"*  and  we  define  n-1-  1  as  the  predecessor  of 
vertex  j  on  a  shortest  path  from  vertex  i  with  all  intermediate  vertices  in  the  set 
{1,2 . k). 

We  can  give  a  recursive  formulation  of  n-1- 1 .  When  k  =  0,  a  shortest  path  from  i 
to  j  has  no  intermediate  vertices  at  all.  Thus, 

„)=In,l  if  i  =  J  oi  Wjj  =  oo  ,  (25.6) 

I  i  i  f  /  /  j  and  u;,7  <  oo  . 

For  k  >  1 ,  if  we  take  the  path  /  ^  k  j ,  where  k  ^  j ,  then  the  predecessor 
of  j  we  choose  is  the  same  as  the  predecessor  of  j  we  chose  on  a  shortest  path 
from  k  with  all  intermediate  vertices  in  the  set  {1, 2, . . . ,  k  —  1}.  Otherwise,  we 
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Figure  25.4  The  sequence  of  matrices  D^)  an(j  computed  by  the  Floyd  Warshall  algorithm 
for  the  graph  in  Figure  25.1. 
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choose  the  same  predecessor  of  j  that  we  chose  on  a  shortest  path  from  i  with  all 
intermediate  vertices  in  the  set  { 1, 2, . . .  ,k  —  1}.  Formally,  for  k  >  1, 


(k) 

ij 


*tl)  if  d™  <  d™  +  d$~l) . 

(k~D  ifrf(*-D  >  d(k-D  +  dik-i) 
nkj  11  uij  ^  uik  ^  ukj 


(25.7) 


We  leave  the  incorporation  of  the  fl</r)  matrix  computations  into  the  Floyd- 
Warshall  procedure  as  Exercise  25.2-3.  Figure  25.4  shows  the  sequence  of  n (k} 
matrices  that  the  resulting  algorithm  computes  for  the  graph  of  Figure  25.1.  The 
exercise  also  asks  for  the  more  difficult  task  of  proving  that  the  predecessor  sub¬ 
graph  Gjrj  is  a  shortest-paths  tree  with  root  i.  Exercise  25.2-7  asks  for  yet  another 
way  to  reconstruct  shortest  paths. 


Transitive  closure  of  a  directed  graph 

Given  a  directed  graph  G  =  (V,  E)  with  vertex  set  V  =  {1,2,..., «},  we  might 
wish  to  determine  whether  G  contains  a  path  from  i  to  j  for  all  vertex  pairs 
i,  j  €  V.  We  define  the  transitive  closure  of  G  as  the  graph  G*  =  (V,  E*),  where 

E*  =  {(/,  j )  :  there  is  a  path  from  vertex  i  to  vertex  j  in  G}  . 

One  way  to  compute  the  transitive  closure  of  a  graph  in  0(«3)  time  is  to  assign 
a  weight  of  1  to  each  edge  of  E  and  run  the  Floyd-Warshall  algorithm.  If  there  is  a 
path  from  vertex  i  to  vertex  j ,  we  get  dij  <  n.  Otherwise,  we  get  dij  =  oo. 

There  is  another,  similar  way  to  compute  the  transitive  closure  of  G  in  0(n3) 
time  that  can  save  time  and  space  in  practice.  This  method  substitutes  the  logical 
operations  V  (logical  OR)  and  A  (logical  AND)  for  the  arithmetic  operations  min 
and  +  in  the  Floyd-Warshall  algorithm.  For  i,  j,k  =  1, 2, we  define  to 
be  1  if  there  exists  a  path  in  graph  G  from  vertex  i  to  vertex  j  with  all  intermediate 
vertices  in  the  set  {1, 2, ... ,  k},  and  0  otherwise.  We  construct  the  transitive  closure 
G*  =  (V,E*)  by  putting  edge  (i,  j )  into  E*  if  and  only  if  =  1.  A  recursive 
definition  of  t^\  analogous  to  recurrence  (25.5),  is 

tm  _  (  0  if  i  /  j  and  (i,  j)  g  E  , 

,j  |  1  if  i  =  j  or  (i ,  j )  €  E  , 

and  for  A;  >  1 , 

% }  =  °  V  (4A_1)  A  t^)  . 

As  in  the  Floyd-Warshall  algorithm,  we  compute  the  matrices  T (k  1 
order  of  increasing  k. 


(25.8) 

=  (O  “ 
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p(.l) 


(1  0  0  0\ 
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0  1  1  0  I 
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Figure  25.5  A  directed  graph  and  the  matrices  computed  by  the  transitive  closure  algorithm. 


Transitive-Closure(G) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 
1  1 
12 
13 


n  =  \G.V\ 


let  T(0)  =  (tip)  be  a  new  n  x  n  matrix 
for  i  =  1  to  n 

for  j  =  1  to  « 

if  i  ==  j  or  (i,j)  €  G.E 


for  k  = 


r/0) 

else 
1  to  n 


let  T(k)  =  be  a  new  n  x  n  matrix 
for  i  =  1  to  ai 


for  j  =  1  to  A! 


return  T(n) 


Ak)  _  (k- 1) 

lij  ~  lij 


v(tf 


A  t 


kj 


') 


Figure  25.5  shows  the  matrices  computed  by  the  Transitive-Closure 
procedure  on  a  sample  graph.  The  TRANSITIVE-CLOSURE  procedure,  like  the 
Floyd- Warshall  algorithm,  runs  in  ©(ai3)  time.  On  some  computers,  though,  log¬ 
ical  operations  on  single-bit  values  execute  faster  than  arithmetic  operations  on 
integer  words  of  data.  Moreover,  because  the  direct  transitive-closure  algorithm 
uses  only  boolean  values  rather  than  integer  values,  its  space  requirement  is  less 
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than  the  Floyd- Warshall  algorithm’s  by  a  factor  corresponding  to  the  size  of  a  word 
of  computer  storage. 

Exercises 


25.2-1 

Run  the  Floyd-Warshall  algorithm  on  the  weighted,  directed  graph  of  Figure  25.2. 
Show  the  matrix  Dik)  that  results  for  each  iteration  of  the  outer  loop. 


25.2-2 

Show  how  to  compute  the  transitive  closure  using  the  technique  of  Section  25.1. 


25.2- 3 

Modify  the  Floyd-Warshall  procedure  to  compute  the  n<i:)  matrices  according 
to  equations  (25.6)  and  (25.7).  Prove  rigorously  that  for  all  i  e  V,  the  predecessor 
subgraph  is  a  shortest-paths  tree  with  root  i.  (Hint:  To  show  that  Gnj  is 
acyclic,  first  show  that  >  =  l  implies  d\^  >  ’  +  wij,  according  to  the 

definition  of  nfj ] .  Then,  adapt  the  proof  of  Lemma  24.16.) 

25.2- 4 

As  it  appeal's  above,  the  Floyd-Warshall  algorithm  requires  0(/z3)  space,  since  we 
compute  dfj'*  for  i.  j.k  =  1,2,...,  n.  Show  that  the  following  procedure,  which 
simply  drops  all  the  superscripts,  is  correct,  and  thus  only  0(«2)  space  is  required. 

Floyd-Warshall'  (W) 

1  n  =  W.rows 

2  D  =  W 

3  for  k  =  1  to  n 

4  for  i  =  1  to  n 

5  for  j  =  1  to  n 

6  dtj  =  min  (du,dik  +  dkj) 

7  return  D 


25.2-5 

Suppose  that  we  modify  the  way  in  which  equation  (25.7)  handles  equality: 


(*) 

ij 


71 


71 


Ik- 1) 
ij 

Ik- 1) 


kj 


if  d , 


if< 


(*-D 

ij 

Ik- 1) 


<  d 
>  d 


Ik- 1) 
ik 

(k- 1) 
ik 


+  dfi-1' . 

+ 4_1)  • 


Is  this  alternative  definition  of  the  predecessor  matrix  fl  correct? 


700 


Chapter  25  All  Pairs  Shortest  Paths 


25.2-6 

How  can  we  use  the  output  of  the  Floyd- Warshall  algorithm  to  detect  the  presence 
of  a  negative-weight  cycle? 


25.2-7 

Another  way  to  reconstruct  shortest  paths  in  the  Floyd-Warshall  algorithm  uses 
values  (l>fi 1  for  i,j,k  =  1,2 . n,  where  <p(^  is  the  highest-numbered  interme¬ 

diate  vertex  of  a  shortest  path  from  i  to  j  in  which  all  intermediate  vertices  are 
in  the  set  {1, 2, . . . ,  k).  Give  a  recursive  formulation  for  (j)^\  modify  the  Floyd- 
Warshall  procedure  to  compute  the  values,  and  rewrite  the  Print-All- 

Pairs-Shortest-Path  procedure  to  take  the  matrix  <t>  =  as  an  input. 

How  is  the  matrix  <f>  like  the  s  table  in  the  matrix-chain  multiplication  problem  of 
Section  15.2? 


25.2-8 

Give  an  0{VE)- time  algorithm  for  computing  the  transitive  closure  of  a  directed 
graph  G  =  ( V ,  E). 


25.2-9 

Suppose  that  we  can  compute  the  transitive  closure  of  a  directed  acyclic  graph  in 
/( \V\  ,  |  /:  |)  time,  where  /  is  a  monotonically  increasing  function  of  |  V\  and  \E\. 
Show  that  the  time  to  compute  the  transitive  closure  G*  =  (V,  E*)  of  a  general 
directed  graph  G  =  ( V ,  E)  is  then  /( \V\  ,  |£j)  +  0(V  +  E*). 


25.3  Johnson’s  algorithm  for  sparse  graphs 

Johnson’s  algorithm  finds  shortest  paths  between  all  pairs  in  0(V2lgV  +  VE) 
time.  For  sparse  graphs,  it  is  asymptotically  faster  than  either  repeated  squaring  of 
matrices  or  the  Floyd-Warshall  algorithm.  The  algorithm  either  returns  a  matrix  of 
shortest-path  weights  for  all  pairs  of  vertices  or  reports  that  the  input  graph  contains 
a  negative-weight  cycle.  Johnson’s  algorithm  uses  as  subroutines  both  Dijkstra’s 
algorithm  and  the  Bellman-Ford  algorithm,  which  Chapter  24  describes. 

Johnson’s  algorithm  uses  the  technique  of  reweighting ,  which  works  as  follows. 
If  all  edge  weights  u;  in  a  graph  G  —  (V,  E)  are  nonnegative,  we  can  find  short¬ 
est  paths  between  all  pairs  of  vertices  by  running  Dijkstra’s  algorithm  once  from 
each  vertex;  with  the  Fibonacci-heap  min-priority  queue,  the  running  time  of  this 
all-pairs  algorithm  is  0(V2  IgV  +  VE).  If  G  has  negative-weight  edges  but  no 
negative-weight  cycles,  we  simply  compute  a  new  set  of  nonnegative  edge  weights 
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that  allows  us  to  use  the  same  method.  The  new  set  of  edge  weights  w  must  satisfy 
two  important  properties: 

1.  For  all  pairs  of  vertices  u.v  e  V,  a  path  p  is  a  shortest  path  from  u  to  v  using 
weight  function  w  if  and  only  if  p  is  also  a  shortest  path  from  u  to  v  using 
weight  function  w. 

2.  For  all  edges  (u,  v),  the  new  weight  w ( u ,  v)  is  nonnegative. 

As  we  shall  see  in  a  moment,  we  can  preprocess  G  to  determine  the  new  weight 
function  w  in  O ( VE )  time. 

Preserving  shortest  paths  by  reweighting 

The  following  lemma  shows  how  easily  we  can  reweight  the  edges  to  satisfy  the 
first  property  above.  We  use  S  to  denote  shortest-path  weights  derived  from  weight 
function  w  and  8  to  denote  shortest-path  weights  derived  from  weight  function  w. 

Lemma  25.1  ( Reweighting  does  not  change  shortest  paths ) 

Given  a  weighted,  directed  graph  G  =  (V,  E )  with  weight  function  w  :  E  — >  R, 
let  h  :  V  — »•  R  be  any  function  mapping  vertices  to  real  numbers.  For  each  edge 
(u,v)  e  E,  define 

w(u ,  v )  =  w{u,  v)  +  h{u)  —  h(v)  .  (25.9) 

Let  p  =  ( i’o .  i’i ,  . . . ,  vk)  be  any  path  from  vertex  v0  to  vertex  vk.  Then  p  is  a 
shortest  path  from  i<0  to  vk  with  weight  function  w  if  and  only  if  it  is  a  shortest  path 
with  weight  function  w.  That  is,  w(p)  —  <5(v0,  vk)  if  and  only  if  w(p)  =  8 (u0,  v^). 
Furthermore,  G  has  a  negative-weight  cycle  using  weight  function  w  if  and  only 
if  G  has  a  negative-weight  cycle  using  weight  function  w. 

Proof  We  start  by  showing  that 


w(p)  =  w(p)  +  h(y0)  -  h (v*)  . 
We  have 


(25.10) 


k 


i  =  1 
k 


=  X)(«"(vi-i,Vi)  +  h(yi- 1)  -  h(Vj)) 


i= 1 
k 


(because  the  sum  telescopes) 


w(p)  +  h(v0)-h(vk)  . 
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Therefore,  any  path  p  from  v0  to  vk  has  w(p)  =  w(p)  +  h(v0)  —  h(vk).  Be¬ 
cause  h(v o)  and  h(v k)  do  not  depend  on  the  path,  if  one  path  from  v0  to  vk  is 
shorter  than  another  using  weight  function  w,  then  it  is  also  shorter  using  w.  Thus, 
w(p)  =  5(v0,  vk)  if  and  only  if  w(p)  =  5(v0,  vk). 

Finally,  we  show  that  G  has  a  negative-weight  cycle  using  weight  function  w  if 
and  only  if  G  has  a  negative- weight  cycle  using  weight  function  w.  Consider  any 
cycle  c  =  (v0,  Vi, . . . ,  vk),  where  v0  =  Vk .  By  equation  (25.10), 

u)(c)  =  w(c)  +  h(v0)  -  h{vk) 

=  w(c)  , 

and  thus  c  has  negative  weight  using  w  if  and  only  if  it  has  negative  weight  us¬ 
ing  w.  m 

Producing  nonnegative  weights  by  reweighting 

Our  next  goal  is  to  ensure  that  the  second  property  holds:  we  want  u>(u,v)  to  be 
nonnegative  for  all  edges  (n,v)  e  E.  Given  a  weighted,  directed  graph  G  = 
(V,  E )  with  weight  function  w  :  E  — >■  R,  we  make  a  new  graph  G'  =  (V',  E'), 
where  V'  =  V  U  {s}  for  some  new  vertex  s  $  V  and  E'  —  E  U  {(5,  v)  :  v  €  V}. 
We  extend  the  weight  function  w  so  that  w(s,v)  =  0  for  all  v  e  V.  Note  that 
because  5  has  no  edges  that  enter  it,  no  shortest  paths  in  G' ,  other  than  those  with 
source  s,  contain  s.  Moreover,  G'  has  no  negative-weight  cycles  if  and  only  if  G 
has  no  negative-weight  cycles.  Figure  25.6(a)  shows  the  graph  G'  corresponding 
to  the  graph  G  of  Figure  25.1. 

Now  suppose  that  G  and  G'  have  no  negative-weight  cycles.  Let  us  define 
h(v)  =  5(5,v)  for  all  v  €  V' .  By  the  triangle  inequality  (Lemma  24.10), 
we  have  h(v)  <  h(u)  +  w(u,v)  for  all  edges  (u,v)  €  E' .  Thus,  if  we  de¬ 
fine  the  new  weights  ui  by  reweighting  according  to  equation  (25.9),  we  have 
w(u ,  v)  =  w(u,  v)  +  h(u)  —  h(v)  >  0,  and  we  have  satisfied  the  second  property. 
Figure  25.6(b)  shows  the  graph  G'  from  Figure  25.6(a)  with  reweighted  edges. 

Computing  all-pairs  shortest  paths 

Johnson’s  algorithm  to  compute  all-pairs  shortest  paths  uses  the  Bellman-Ford  al¬ 
gorithm  (Section  24.1)  and  Dijkstra’s  algorithm  (Section  24.3)  as  subroutines.  It 
assumes  implicitly  that  the  edges  are  stored  in  adjacency  lists.  The  algorithm  re¬ 
turns  the  usual  \  V\  x  \  V\  matrix  D  =  d/j,  where  c/,-7  =  S(i,  j ),  or  it  reports  that 
the  input  graph  contains  a  negative-weight  cycle.  As  is  typical  for  an  all-pairs 
shortest-paths  algorithm,  we  assume  that  the  vertices  are  numbered  from  1  to  |  V  \ . 
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Figure  25.6  Johnson’s  all  pairs  shortest  paths  algorithm  run  on  the  graph  of  Figure  25.1.  Ver 
tex  numbers  appear  outside  the  vertices,  (a)  The  graph  G'  with  the  original  weight  function  w. 
The  new  vertex  s  is  black.  Within  each  vertex  i>  is  h(v)  =  8(s.  v).  (b)  After  reweighting  each 
edge  (u,  v)  with  weight  function  w(u ,  v)  =  w(n.  v)  +  h(u)  —  h(v).  (c)  (g)  The  result  of  running 
Dijkstra’s  algorithm  on  each  vertex  of  G  using  weight  function  w.  In  each  part,  the  source  vertex  u 
is  black,  and  shaded  edges  are  in  the  shortest  paths  tree  computed  by  the  algorithm.  Within  each 
vertex  v  are  the  values  8(u,  v)  and  8(u,  v),  separated  by  a  slash.  The  value  duv  =  8(u,  v)  is  equal  to 
8(u,  v)  +  h(v)  —  h(u). 


704 


Chapter  25  All  Pairs  Shortest  Paths 


Johnson(G,  w) 

1  compute  G' ,  where  G1 .  V  =  G.  V  U  {5}, 

G' .E  =  G.E  U  {( s ,  u)  :  v  e  G.  V },  and 
w(s,  v)  =  0  for  all  v  e  G.V 

2  if  Bellman-Ford  (G',  w,s)  ==  false 

3  print  “the  input  graph  contains  a  negative-weight  cycle” 

4  else  for  each  vertex  v  €  G' .  V 

5  set  h(v)  to  the  value  of  <5(4\  v) 

computed  by  the  Bellman-Ford  algorithm 

6  for  each  edge  (u,  v)  e  G'  .E 

7  w(u,  u)  =  w(u,  v)  +  h(u)  —  h(y) 

8  let  D  =  (duv)  be  a  new  n  x  n  matrix 

9  for  each  vertex  u  €  G.V 

10  run  Dijkstra(G,  w ,  u)  to  compute  8(u ,  y)  for  all  v  e  G.V 

1 1  for  each  vertex  v  e  G.V 

12  duV  =  8(u,  y)  +  h(v)  —  h(u) 

13  return  D 

This  code  simply  performs  the  actions  we  specified  earlier.  Line  1  produces  G' . 
Line  2  runs  the  Bellman-Ford  algorithm  on  G'  with  weight  function  w  and  source 
vertex  s.  If  G',  and  hence  G,  contains  a  negative-weight  cycle,  line  3  reports  the 
problem.  Lines  4-12  assume  that  G'  contains  no  negative-weight  cycles.  Lines  4-5 
set  h(v)  to  the  shortest-path  weight  S(s.  v)  computed  by  the  Bellman-Ford  algo¬ 
rithm  for  all  y  €  V' .  Lines  6-7  compute  the  new  weights  w.  For  each  pair  of  ver¬ 
tices  u,  v  €  V,  the  for  loop  of  lines  9-12  computes  the  shortest-path  weight  8(u,v ) 
by  calling  Dijkstra’s  algorithm  once  from  each  vertex  in  V.  Line  12  stores  in 
matrix  entry  duv  the  correct  shortest-path  weight  8(u,  u),  calculated  using  equa¬ 
tion  (25.10).  Finally,  line  13  returns  the  completed  D  matrix.  Figure  25.6  depicts 
the  execution  of  Johnson’s  algorithm. 

If  we  implement  the  min-priority  queue  in  Dijkstra’s  algorithm  by  a  Fibonacci 
heap,  Johnson’s  algorithm  runs  in  0(V2  lg  V  +  VE)  time.  The  simpler  binary  min- 
heap  implementation  yields  a  running  time  of  0{VE  lg  V),  which  is  still  asymp¬ 
totically  faster  than  the  Floyd-Warshall  algorithm  if  the  graph  is  sparse. 

Exercises 


25.3-1 

Use  Johnson’s  algorithm  to  find  the  shortest  paths  between  all  pairs  of  vertices  in 
the  graph  of  Figure  25.2.  Show  the  values  of  h  and  w  computed  by  the  algorithm. 
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25.3-2 

What  is  the  purpose  of  adding  the  new  vertex  s  to  V,  yielding  V'l 


25.3-3 

Suppose  that  w(u,v)  >  0  for  all  edges  (u.v)  €  E.  What  is  the  relationship 
between  the  weight  functions  w  and  u)? 


25.3-4 

Professor  Greenstreet  claims  that  there  is  a  simpler  way  to  reweight  edges  than 
the  method  used  in  Johnson’s  algorithm.  Letting  w*  —  min(UiU)e£  {w(u.  v ) {,  just 
define  w(u.v)  =  w(u,  v)  —  w*  for  all  edges  (u.v)  €  E.  What  is  wrong  with  the 
professor’s  method  of  reweighting? 


25.3-5 

Suppose  that  we  run  Johnson’s  algorithm  on  a  directed  graph  G  with  weight  func¬ 
tion  w.  Show  that  if  G  contains  a  0-weight  cycle  c,  then  w(u,v )  =  0  for  every 
edge  (: u ,  v)  in  c. 


25.3-6 

Professor  Michener  claims  that  there  is  no  need  to  create  a  new  source  vertex  in 
line  1  of  Johnson.  He  claims  that  instead  we  can  just  use  G'  =  G  and  let  s  be  any 
vertex.  Give  an  example  of  a  weighted,  directed  graph  G  for  which  incorporating 
the  professor’s  idea  into  JOHNSON  causes  incorrect  answers.  Then  show  that  if  G 
is  strongly  connected  (every  vertex  is  reachable  from  every  other  vertex),  the  results 
returned  by  JOHNSON  with  the  professor’s  modification  are  correct. 


Problems 


25-1  Transitive  closure  of  a  dynamic  graph 

Suppose  that  we  wish  to  maintain  the  transitive  closure  of  a  directed  graph  G  = 
(V,  E)  as  we  insert  edges  into  E.  That  is,  after  each  edge  has  been  inserted,  we 
want  to  update  the  transitive  closure  of  the  edges  inserted  so  far.  Assume  that  the 
graph  G  has  no  edges  initially  and  that  we  represent  the  transitive  closure  as  a 
boolean  matrix. 

a.  Show  how  to  update  the  transitive  closure  G*  =  (V.  E*)  of  a  graph  G  =  (V.  E) 
in  0(V2)  time  when  a  new  edge  is  added  to  G. 

b.  Give  an  example  of  a  graph  G  and  an  edge  e  such  that  £l(V2)  time  is  required 
to  update  the  transitive  closure  after  the  insertion  of  e  into  G,  no  matter  what 
algorithm  is  used. 
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c.  Describe  an  efficient  algorithm  for  updating  the  transitive  closure  as  edges  are 
inserted  into  the  graph.  For  any  sequence  of  n  insertions,  your  algorithm  should 
run  in  total  time  Y11=  1 9  =  0(V3),  where  l,  is  the  time  to  update  the  transitive 
closure  upon  inserting  the  z'th  edge.  Prove  that  your  algorithm  attains  this  time 
bound. 

25-2  Shortest  paths  in  e-dense  graphs 

A  graph  G  =  (V.  E)  is  e-dense  if  \E\  =  0(Fl+e)  for  some  constant  e  in  the 
range  0  <  e  <  1.  By  using  d- ary  min-heaps  (see  Problem  6-2)  in  shortest-paths 
algorithms  on  e-dense  graphs,  we  can  match  the  running  times  of  Fibonacci-heap- 
based  algorithms  without  using  as  complicated  a  data  structure. 

a.  What  are  the  asymptotic  running  times  for  INSERT,  Extract-Min,  and 
Decrease-Key,  as  a  function  of  d  and  the  number  n  of  elements  in  a  ri-ary 
min-heap?  What  are  these  running  times  if  we  choose  d  =  ©(«“)  for  some 
constant  0  <  a  <  1?  Compare  these  running  times  to  the  amortized  costs  of 
these  operations  for  a  Fibonacci  heap. 

b.  Show  how  to  compute  shortest  paths  from  a  single  source  on  an  e-dense  directed 
graph  G  =  ( V ,  E)  with  no  negative-weight  edges  in  0(E)  time.  (Hint:  Pick  d 
as  a  function  of  e.) 

c.  Show  how  to  solve  the  all-pairs  shortest-paths  problem  on  an  e-dense  directed 
graph  G  —  ( V ,  E)  with  no  negative-weight  edges  in  O(VE)  time. 

d.  Show  how  to  solve  the  all-pairs  shortest-paths  problem  in  0(VE)  time  on  an 
e-dense  directed  graph  G  =  (V.  E)  that  may  have  negative-weight  edges  but 
has  no  negative-weight  cycles. 


Chapter  notes 

Lawler  [224]  has  a  good  discussion  of  the  all-pairs  shortest-paths  problem,  al¬ 
though  he  does  not  analyze  solutions  for  sparse  graphs.  He  attributes  the  matrix- 
multiplication  algorithm  to  the  folklore.  The  Floyd-Warshall  algorithm  is  due  to 
Floyd  [105],  who  based  it  on  a  theorem  of  Warshall  [349]  that  describes  how  to 
compute  the  transitive  closure  of  boolean  matrices.  Johnson’s  algorithm  is  taken 
from  [192]. 

Several  researchers  have  given  improved  algorithms  for  computing  shortest 
paths  via  matrix  multiplication.  Fredman  [111]  shows  how  to  solve  the  all¬ 
pairs  shortest  paths  problem  using  0(V5/2)  comparisons  between  sums  of  edge 
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weights  and  obtains  an  algorithm  that  runs  in  0(  F3(lglg  L/lg  F)1/3)  time,  which 
is  slightly  better  than  the  running  time  of  the  Floyd-Warshall  algorithm.  Han  [159] 
reduced  the  running  time  to  0(V3( lglg  F/lg  F)5^4).  Another  line  of  research 
demonstrates  that  we  can  apply  algorithms  for  fast  matrix  multiplication  (see  the 
chapter  notes  for  Chapter  4)  to  the  all-pairs  shortest  paths  problem.  Let  0(n0>)  be 
the  running  time  of  the  fastest  algorithm  for  multiplying  n  x  n  matrices;  currently 
oo  <  2.376  [78].  Galil  and  Margalit  [123,  124]  and  Seidel  [308]  designed  algo¬ 
rithms  that  solve  the  all-pairs  shortest  paths  problem  in  undirected,  unweighted 
graphs  in  (V™ p(V))  time,  where  p{n)  denotes  a  particular  function  that  is  poly- 
logarithmically  bounded  in  n.  In  dense  graphs,  these  algorithms  are  faster  than 
the  O(VE)  time  needed  to  perform  \  V\  breadth-first  searches.  Several  researchers 
have  extended  these  results  to  give  algorithms  for  solving  the  all-pairs  shortest 
paths  problem  in  undirected  graphs  in  which  the  edge  weights  are  integers  in  the 
range  {1,2,...,  Wj.  The  asymptotically  fastest  such  algorithm,  by  Shoshan  and 
Zwick  [316],  runs  in  time  0(WVw p{VW)). 

Karger,  Koller,  and  Phillips  [196]  and  independently  McGeoch  [247]  have  given 
a  time  bound  that  depends  on  E* ,  the  set  of  edges  in  E  that  participate  in  some 
shortest  path.  Given  a  graph  with  nonnegative  edge  weights,  their  algorithms  run  in 
0(VE *  +  V2  lg  V)  time  and  improve  upon  running  Dijkstra’s  algorithm  \  V\  times 
when  |7i*|  =  o{E). 

Baswana,  Hariharan,  and  Sen  [33]  examined  decremental  algorithms  for  main¬ 
taining  all -pairs  shortest  paths  and  transitive-closure  information.  Decremen¬ 
tal  algorithms  allow  a  sequence  of  intermixed  edge  deletions  and  queries;  by 
comparison,  Problem  25-1,  in  which  edges  are  inserted,  asks  for  an  incremen¬ 
tal  algorithm.  The  algorithms  by  Baswana,  Hariharan,  and  Sen  are  randomized 
and,  when  a  path  exists,  their  transitive-closure  algorithm  can  fail  to  report  it 
with  probability  \/nc  for  an  arbitrary  c  >  0.  The  query  times  are  0(1)  with 
high  probability.  For  transitive  closure,  the  amortized  time  for  each  update  is 
0(F4/3  lg1'3  V).  For  all-pairs  shortest  paths,  the  update  times  depend  on  the 
queries.  For  queries  just  giving  the  shortest-path  weights,  the  amortized  time  per 
update  is  0(V3/E  lg2  V).  To  report  the  actual  shortest  path,  the  amortized  up¬ 
date  time  is  min(0(F3,/2^/lg  V),  0(V3 / E  lg2  V)).  Demetrescu  and  Italiano  [84] 
showed  how  to  handle  update  and  query  operations  when  edges  are  both  inserted 
and  deleted,  as  long  as  each  given  edge  has  a  bounded  range  of  possible  values 
drawn  from  the  real  numbers. 

Aho,  Hopcroft,  and  Ullman  [5]  defined  an  algebraic  structure  known  as  a  “closed 
semiring,”  which  serves  as  a  general  framework  for  solving  path  problems  in  di¬ 
rected  graphs.  Both  the  Floyd-Warshall  algorithm  and  the  transitive-closure  algo¬ 
rithm  from  Section  25.2  are  instantiations  of  an  all-pairs  algorithm  based  on  closed 
semirings.  Maggs  and  Plotkin  [240]  showed  how  to  find  minimum  spanning  trees 
using  a  closed  semiring. 
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Just  as  we  can  model  a  road  map  as  a  directed  graph  in  order  to  find  the  shortest 
path  from  one  point  to  another,  we  can  also  interpret  a  directed  graph  as  a  “flow 
network”  and  use  it  to  answer  questions  about  material  flows.  Imagine  a  mate¬ 
rial  coursing  through  a  system  from  a  source,  where  the  material  is  produced,  to 
a  sink,  where  it  is  consumed.  The  source  produces  the  material  at  some  steady 
rate,  and  the  sink  consumes  the  material  at  the  same  rate.  The  “flow”  of  the  mate¬ 
rial  at  any  point  in  the  system  is  intuitively  the  rate  at  which  the  material  moves. 
Flow  networks  can  model  many  problems,  including  liquids  flowing  through  pipes, 
parts  through  assembly  lines,  current  through  electrical  networks,  and  information 
through  communication  networks. 

We  can  think  of  each  directed  edge  in  a  flow  network  as  a  conduit  for  the  mate¬ 
rial.  Each  conduit  has  a  stated  capacity,  given  as  a  maximum  rate  at  which  the  ma¬ 
terial  can  flow  through  the  conduit,  such  as  200  gallons  of  liquid  per  hour  through 
a  pipe  or  20  amperes  of  electrical  current  through  a  wire.  Vertices  are  conduit 
junctions,  and  other  than  the  source  and  sink,  material  flows  through  the  vertices 
without  collecting  in  them.  In  other  words,  the  rate  at  which  material  enters  a  ver¬ 
tex  must  equal  the  rate  at  which  it  leaves  the  vertex.  We  call  this  property  “flow 
conservation,”  and  it  is  equivalent  to  Kirchhoff’s  current  law  when  the  material  is 
electrical  current. 

In  the  maximum-flow  problem,  we  wish  to  compute  the  greatest  rate  at  which 
we  can  ship  material  from  the  source  to  the  sink  without  violating  any  capacity 
constraints.  It  is  one  of  the  simplest  problems  concerning  flow  networks  and,  as 
we  shall  see  in  this  chapter,  this  problem  can  be  solved  by  efficient  algorithms. 
Moreover,  we  can  adapt  the  basic  techniques  used  in  maximum-flow  algorithms  to 
solve  other  network-flow  problems. 

This  chapter  presents  two  general  methods  for  solving  the  maximum-flow  prob¬ 
lem.  Section  26.1  formalizes  the  notions  of  flow  networks  and  flows,  formally 
defining  the  maximum-flow  problem.  Section  26.2  describes  the  classical  method 
of  Ford  and  Fulkerson  for  finding  maximum  flows.  An  application  of  this  method, 
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finding  a  maximum  matching  in  an  undirected  bipartite  graph,  appeal's  in  Sec¬ 
tion  26.3.  Section  26.4  presents  the  push-relabel  method,  which  underlies  many  of 
the  fastest  algorithms  for  network-flow  problems.  Section  26.5  covers  the  “relabel- 
to-front”  algorithm,  a  particular  implementation  of  the  push-relabel  method  that 
runs  in  time  0(V3).  Although  this  algorithm  is  not  the  fastest  algorithm  known, 
it  illustrates  some  of  the  techniques  used  in  the  asymptotically  fastest  algorithms, 
and  it  is  reasonably  efficient  in  practice. 


26.1  Flow  networks 

In  this  section,  we  give  a  graph-theoretic  definition  of  flow  networks,  discuss  their 
properties,  and  define  the  maximum-flow  problem  precisely.  We  also  introduce 
some  helpful  notation. 

Flow  networks  and  flows 

A  flow  network  G  =  ( V ,  E)  is  a  directed  graph  in  which  each  edge  (u,  v)  e  E 
has  a  nonnegative  capacity  c(u  ,v)  >  0.  We  further  require  that  if  E  contains  an 
edge  (u.  u),  then  there  is  no  edge  (v.  u)  in  the  reverse  direction.  (We  shall  see 
shortly  how  to  work  around  this  restriction.)  If  (u.v)  ^  E,  then  for  convenience 
we  define  c(u,v)  =  0,  and  we  disallow  self-loops.  We  distinguish  two  vertices 
in  a  flow  network:  a  source  s  and  a  sink  t.  For  convenience,  we  assume  that  each 
vertex  lies  on  some  path  from  the  source  to  the  sink.  That  is,  for  each  vertex  v  e  V, 
the  flow  network  contains  a  path  s  ^  v  t.  The  graph  is  therefore  connected 
and,  since  each  vertex  other  than  s  has  at  least  one  entering  edge,  \E\  >  m-i. 
Figure  26. 1  shows  an  example  of  a  flow  network. 

We  are  now  ready  to  define  flows  more  formally.  Let  G  =  (V.  E)  be  a  flow 
network  with  a  capacity  function  c.  Let  s  be  the  source  of  the  network,  and  let  t  be 
the  sink.  A  flow  in  G  is  a  real-valued  function  /  :  V  x  V  — >■  R  that  satisfies  the 
following  two  properties: 

Capacity  constraint:  For  all  n,  v  e  V,  we  require  0  <  f(u,v)  <  c ( u ,  v). 

Flow  conservation:  For  all  u  €  V  —  |.s\  l),  we  require 

E /(*.“)  =  E  f(u'  v>  ■ 

veV  veV 

When  (w,  v)  $  E,  there  can  be  no  flow  from  u  to  v,  and  f(u,  v)  =  0. 
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Edmonton  Saskatoon 


(b) 


Figure  26.1  (a)  A  flow  network  G  =  ( V ,  E)  for  the  Lucky  Puck  Company’s  trucking  problem. 

The  Vancouver  factory  is  the  source  s ,  and  the  Winnipeg  warehouse  is  the  sink  t.  The  company  ships 
pucks  through  intermediate  cities,  but  only  c(u,  v)  crates  per  day  can  go  from  city  u  to  city  v.  Each 
edge  is  labeled  with  its  capacity,  (b)  A  flow  f  in  G  with  value  |/|  =  19.  Each  edge  (u,  v)  is  labeled 
by  /(w,  v)/c(u,  v).  The  slash  notation  merely  separates  the  flow  and  capacity;  it  does  not  indicate 
division. 


We  call  the  nonnegative  quantity  f(u,  v)  the  flow  from  vertex  u  to  vertex  v.  The 
value  |/|  of  a  flow  /  is  defined  as 

I/I  =  (26.1) 

veV  veV 

that  is,  the  total  flow  out  of  the  source  minus  the  flow  into  the  source.  (Here,  the  |-| 
notation  denotes  flow  value,  not  absolute  value  or  cardinality.)  Typically,  a  flow 
network  will  not  have  any  edges  into  the  source,  and  the  flow  into  the  source,  given 
by  the  summation  YlveV  f  (u-  s)»  will  be  0.  We  include  it,  however,  because  when 
we  introduce  residual  networks  later  in  this  chapter,  the  flow  into  the  source  will 
become  significant.  In  the  maximum-flow  problem,  we  are  given  a  flow  network  G 
with  source  s  and  sink  t,  and  we  wish  to  find  a  flow  of  maximum  value. 

Before  seeing  an  example  of  a  network-flow  problem,  let  us  briefly  explore  the 
definition  of  flow  and  the  two  flow  properties.  The  capacity  constraint  simply 
says  that  the  flow  from  one  vertex  to  another  must  be  nonnegative  and  must  not 
exceed  the  given  capacity.  The  flow-conservation  property  says  that  the  total  flow 
into  a  vertex  other  than  the  source  or  sink  must  equal  the  total  flow  out  of  that 
vertex— informally,  “flow  in  equals  flow  out.’’ 

An  example  of  flow 

A  flow  network  can  model  the  trucking  problem  shown  in  Figure  26. 1  (a).  The 
Lucky  Puck  Company  has  a  factory  (source  s)  in  Vancouver  that  manufactures 
hockey  pucks,  and  it  has  a  warehouse  (sink  t)  in  Winnipeg  that  stocks  them.  Lucky 
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Figure  26.2  Converting  a  network  with  antiparallel  edges  to  an  equivalent  one  with  no  antiparallel 
edges,  (a)  A  flow  network  containing  both  the  edges  (iq ,  iq)  and  (iq ,  vj ).  (b)  An  equivalent  network 
with  no  antiparallel  edges.  We  add  the  new  vertex  v',  and  we  replace  edge  (vi ,  V2)  by  the  pair  of 
edges  (vj ,  v')  and  (i/,  V2),  both  with  the  same  capacity  as  (iq ,  V2). 

Puck  leases  space  on  trucks  from  another  firm  to  ship  the  pucks  from  the  factory 
to  the  warehouse.  Because  the  trucks  travel  over  specified  routes  (edges)  between 
cities  (vertices)  and  have  a  limited  capacity,  Lucky  Puck  can  ship  at  most  c(u,  v) 
crates  per  day  between  each  pair  of  cities  u  and  v  in  Figure  26.1(a).  Lucky  Puck 
has  no  control  over  these  routes  and  capacities,  and  so  the  company  cannot  alter 
the  flow  network  shown  in  Figure  26.1(a).  They  need  to  determine  the  largest 
number  p  of  crates  per  day  that  they  can  ship  and  then  to  produce  this  amount,  since 
there  is  no  point  in  producing  more  pucks  than  they  can  ship  to  their  warehouse. 
Lucky  Puck  is  not  concerned  with  how  long  it  takes  for  a  given  puck  to  get  from 
the  factory  to  the  warehouse;  they  care  only  that  p  crates  per  day  leave  the  factory 
and  p  crates  per  day  arrive  at  the  warehouse. 

We  can  model  the  “flow”  of  shipments  with  a  flow  in  this  network  because  the 
number  of  crates  shipped  per  day  from  one  city  to  another  is  subject  to  a  capacity 
constraint.  Additionally,  the  model  must  obey  flow  conservation,  for  in  a  steady 
state,  the  rate  at  which  pucks  enter  an  intermediate  city  must  equal  the  rate  at  which 
they  leave.  Otherwise,  crates  would  accumulate  at  intermediate  cities. 

Modeling  problems  with  antiparallel  edges 

Suppose  that  the  trucking  firm  offered  Lucky  Puck  the  opportunity  to  lease  space 
for  10  crates  in  trucks  going  from  Edmonton  to  Calgary.  It  would  seem  natural  to 
add  this  opportunity  to  our  example  and  form  the  network  shown  in  Figure  26.2(a). 
This  network  suffers  from  one  problem,  however:  it  violates  our  original  assump¬ 
tion  that  if  an  edge  (iq,  t>2)  €  E,  then  (iq,  iq)  &  E.  We  call  the  two  edges  (iq,  v2) 
and  (v2,  iq)  antiparallel.  Thus,  if  we  wish  to  model  a  flow  problem  with  antipar¬ 
allel  edges,  we  must  transform  the  network  into  an  equivalent  one  containing  no 
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antiparallel  edges.  Figure  26.2(b)  displays  this  equivalent  network.  We  choose 
one  of  the  two  antiparallel  edges,  in  this  case  (iq,  v2),  and  split  it  by  adding  a  new 
vertex  v'  and  replacing  edge  (iq,  v2 )  with  the  pair  of  edges  (iq,  v')  and  (y',v2). 
We  also  set  the  capacity  of  both  new  edges  to  the  capacity  of  the  original  edge. 
The  resulting  network  satisfies  the  property  that  if  an  edge  is  in  the  network,  the 
reverse  edge  is  not.  Exercise  26.1-1  asks  you  to  prove  that  the  resulting  network  is 
equivalent  to  the  original  one. 

Thus,  we  see  that  a  real-world  flow  problem  might  be  most  naturally  modeled 
by  a  network  with  antiparallel  edges.  It  will  be  convenient  to  disallow  anti  par¬ 
allel  edges,  however,  and  so  we  have  a  straightforward  way  to  convert  a  network 
containing  antiparallel  edges  into  an  equivalent  one  with  no  antiparallel  edges. 

Networks  with  multiple  sources  and  sinks 

A  maximum-flow  problem  may  have  several  sources  and  sinks,  rather  than  just 
one  of  each.  The  Lucky  Puck  Company,  for  example,  might  actually  have  a  set 
of  m  factories  {.sq ,  s2 , . . . ,  sm }  and  a  set  of  n  warehouses  {t\ ,  t2 , . . . ,  tn },  as  shown 
in  Figure  26.3(a).  Fortunately,  this  problem  is  no  harder  than  ordinary  maximum 
flow. 

We  can  reduce  the  problem  of  determining  a  maximum  flow  in  a  network  with 
multiple  sources  and  multiple  sinks  to  an  ordinary  maximum-flow  problem.  Fig¬ 
ure  26.3(b)  shows  how  to  convert  the  network  from  (a)  to  an  ordinary  flow  network 
with  only  a  single  source  and  a  single  sink.  We  add  a  supersource  s  and  add  a 
directed  edge  ( s ,  st)  with  capacity  c(s, .?,)  =  oo  for  each  i  =  1,2,...,  m.  We  also 
create  a  new  supersink  t  and  add  a  directed  edge  (q ,  l )  with  capacity  c(q  J  )  =  oo 
for  each  i  =  1,2 Intuitively,  any  flow  in  the  network  in  (a)  corresponds  to 
a  flow  in  the  network  in  (b),  and  vice  versa.  The  single  source  s  simply  provides 
as  much  flow  as  desired  for  the  multiple  sources  Sj ,  and  the  single  sink  t  likewise 
consumes  as  much  flow  as  desired  for  the  multiple  sinks  q.  Exercise  26.1-2  asks 
you  to  prove  formally  that  the  two  problems  are  equivalent. 

Exercises 


26.1-1 

Show  that  splitting  an  edge  in  a  flow  network  yields  an  equivalent  network.  More 
formally,  suppose  that  flow  network  G  contains  edge  (n,  v),  and  we  create  a  new 
flow  network  G'  by  creating  a  new  vertex  x  and  replacing  (u.  v)  by  new  edges 
(u.x)  and  (x,v)  withc(n,x)  =  c(x,v )  =  c(u,v).  Show  that  a  maximum  flow 
in  G'  has  the  same  value  as  a  maximum  flow  in  G. 
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Figure  263  Converting  a  multiple  source,  multiple  sink  maximum  flow  problem  into  a  problem 
with  a  single  source  and  a  single  sink,  (a)  A  flow  network  with  five  sources  S  =  {it ,  $2,  -S3,  $4,  ss} 
and  three  sinks  T  =  {t1.t2.t3}.  (b)  An  equivalent  single  source,  single  sinkflow  network.  We  add 
a  supersource  s  and  an  edge  with  infinite  capacity  from  s  to  each  of  the  multiple  sources.  We  also 
add  a  supersink  t  and  an  edge  with  infinite  capacity  from  each  of  the  multiple  sinks  to  t. 


26.1-2 

Extend  the  flow  properties  and  definitions  to  the  multiple-source,  multiple-sink 
problem.  Show  that  any  flow  in  a  multiple-source,  multiple-sink  flow  network 
corresponds  to  a  flow  of  identical  value  in  the  single-source,  single-sink  network 
obtained  by  adding  a  supersource  and  a  supersink,  and  vice  versa. 


26.1-3 

Suppose  that  a  flow  network  G  =  (  V,  E)  violates  the  assumption  that  the  network 
contains  a  path  s  v  t  for  all  vertices  v  e  V.  Let  u  be  a  vertex  for  which  there 
is  no  path  s  u  t.  Show  that  there  must  exist  a  maximum  flow  f  in  G  such 
that  f(u,v)  =  f(v,u)  =  0  for  all  vertices  v  €  V. 
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26.1-4 

Let  /  be  a  flow  in  a  network,  and  let  a  be  a  real  number.  The  scalar  flow  product, 
denoted  af,  is  a  function  from  V  x  V  to  R  defined  by 

(af)(u,  v)  =  a  ■  f(u,  v)  . 

Prove  that  the  flows  in  a  network  form  a  convex  set.  That  is,  show  that  if  fl  and  f2 
are  flows,  then  so  is  af\  +  (1  —  ot)f2  for  all  a  in  the  range  0  <  a  <  1. 


26.1-5 

State  the  maximum-flow  problem  as  a  linear-programming  problem. 


26.1-6 

Professor  Adam  has  two  children  who,  unfortunately,  dislike  each  other.  The  prob¬ 
lem  is  so  severe  that  not  only  do  they  refuse  to  walk  to  school  together,  but  in  fact 
each  one  refuses  to  walk  on  any  block  that  the  other  child  has  stepped  on  that  day. 
The  children  have  no  problem  with  their  paths  crossing  at  a  comer.  Fortunately 
both  the  professor’s  house  and  the  school  are  on  corners,  but  beyond  that  he  is  not 
sure  if  it  is  going  to  be  possible  to  send  both  of  his  children  to  the  same  school. 
The  professor  has  a  map  of  his  town.  Show  how  to  formulate  the  problem  of  de¬ 
termining  whether  both  his  children  can  go  to  the  same  school  as  a  maximum-flow 
problem. 


26.1-7 

Suppose  that,  in  addition  to  edge  capacities,  a  flow  network  has  vertex  capacities. 
That  is  each  vertex  v  has  a  limit  l(v )  on  how  much  flow  can  pass  though  v.  Show 
how  to  transform  a  flow  network  G  =  (V,  E)  with  vertex  capacities  into  an  equiv¬ 
alent  flow  network  G'  =  ( V ',  E')  without  vertex  capacities,  such  that  a  maximum 
flow  in  G'  has  the  same  value  as  a  maximum  flow  in  G.  How  many  vertices  and 
edges  does  G'  have? 


26.2  The  Ford-Fulkerson  method 

This  section  presents  the  Ford-Fulkerson  method  for  solving  the  maximum-flow 
problem.  We  call  it  a  “method”  rather  than  an  “algorithm”  because  it  encompasses 
several  implementations  with  differing  running  times.  The  Ford-Fulkerson  method 
depends  on  three  important  ideas  that  transcend  the  method  and  are  relevant  to 
many  flow  algorithms  and  problems:  residual  networks,  augmenting  paths,  and 
cuts.  These  ideas  are  essential  to  the  important  max-flow  min-cut  theorem  (The¬ 
orem  26.6),  which  characterizes  the  value  of  a  maximum  flow  in  terms  of  cuts  of 
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the  flow  network.  We  end  this  section  by  presenting  one  specific  implementation 
of  the  Ford-Fulkerson  method  and  analyzing  its  running  time. 

The  Ford-Fulkerson  method  iteratively  increases  the  value  of  the  flow.  We  start 
with  f(u.v)  —  0  for  all  u.v  e  V,  giving  an  initial  flow  of  value  0.  At  each 
iteration,  we  increase  the  flow  value  in  G  by  finding  an  “augmenting  path”  in  an 
associated  “residual  network”  Gf.  Once  we  know  the  edges  of  an  augmenting 
path  in  G/,  we  can  easily  identify  specific  edges  in  G  for  which  we  can  change 
the  flow  so  that  we  increase  the  value  of  the  flow.  Although  each  iteration  of  the 
Ford-Fulkerson  method  increases  the  value  of  the  flow,  we  shall  see  that  the  flow 
on  any  particular-  edge  of  G  may  increase  or  decrease;  decreasing  the  flow  on  some 
edges  may  be  necessary  in  order  to  enable  an  algorithm  to  send  more  flow  from  the 
source  to  the  sink.  We  repeatedly  augment  the  flow  until  the  residual  network  has 
no  more  augmenting  paths.  The  max-flow  min-cut  theorem  will  show  that  upon 
termination,  this  process  yields  a  maximum  flow. 

Ford-Fulkerson-Method  (G,  5,  t) 

1  initialize  flow  /  to  0 

2  while  there  exists  an  augmenting  path  p  in  the  residual  network  G/ 

3  augment  flow  /  along  p 

4  return  / 

In  order  to  implement  and  analyze  the  Ford-Fulkerson  method,  we  need  to  intro¬ 
duce  several  additional  concepts. 

Residual  networks 

Intuitively,  given  a  flow  network  G  and  a  flow  /,  the  residual  network  G/  consists 
of  edges  with  capacities  that  represent  how  we  can  change  the  flow  on  edges  of  G. 
An  edge  of  the  flow  network  can  admit  an  amount  of  additional  flow  equal  to  the 
edge’s  capacity  minus  the  flow  on  that  edge.  If  that  value  is  positive,  we  place 
that  edge  into  G/  with  a  “residual  capacity”  of  C/(u,v)  =  c(u.v)  —  f(u.v). 
The  only  edges  of  G  that  are  in  G/  are  those  that  can  admit  more  flow;  those 
edges  (n,  v)  whose  flow  equals  their  capacity  have  C/(n,  v)  =  0,  and  they  are  not 
in  Gf. 

The  residual  network  Gf  may  also  contain  edges  that  are  not  in  G,  however. 
As  an  algorithm  manipulates  the  flow,  with  the  goal  of  increasing  the  total  flow,  it 
might  need  to  decrease  the  flow  on  a  particular  edge.  In  order  to  represent  a  pos¬ 
sible  decrease  of  a  positive  flow  f(u.  v)  on  an  edge  in  G,  we  place  an  edge  (v,  u) 
into  Gf  with  residual  capacity  cy(v,  u)  =  f(u ,  v)— that  is,  an  edge  that  can  admit 
flow  in  the  opposite  direction  to  (n,  v),  at  most  canceling  out  the  flow  on  ( u ,  v). 
These  reverse  edges  in  the  residual  network  allow  an  algorithm  to  send  back  flow 
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it  has  already  sent  along  an  edge.  Sending  flow  back  along  an  edge  is  equiva¬ 
lent  to  decreasing  the  flow  on  the  edge,  which  is  a  necessary  operation  in  many 
algorithms. 

More  formally,  suppose  that  we  have  a  flow  network  G  =  (V,  E)  with  source  s 
and  sink  t.  Let  /  be  a  flow  in  G,  and  consider  a  pair  of  vertices  u,  v  €  V.  We 
define  the  residual  capacity  Cf(u,v)  by 


c/(u, v) 


c(u,v)  —  f(u,v)  if  (u,v)  e  E  , 
/ (v,  u)  if  (y,  u)  e  E  , 

0  otherwise  . 


(26.2) 


Because  of  our  assumption  that  (w,  v)  e  E  implies  (v  At )  ^  E,  exactly  one  case  in 
equation  (26.2)  applies  to  each  ordered  pair  of  vertices. 

As  an  example  of  equation  (26.2),  if  c(u,  y)  =  16  and  f(u,  v)  =  11,  then  we 
can  increase  f(u ,  v)  by  up  to  c/(w,  y)  =  5  units  before  we  exceed  the  capacity 
constraint  on  edge  (w,  v).  We  also  wish  to  allow  an  algorithm  to  return  up  to  11 
units  of  flow  from  v  to  u,  and  hence  C/(v,  u)  =  1 1. 

Given  a  flow  network  G  =  (V.  E)  and  a  flow  /,  the  residual  network  of  G 
induced  by  /  is  G/  =  (V.  Ef),  where 

Ef  =  {(u,  y)  €  V  x  V  :  Cf(u,  y)  >  0}  .  (26.3) 

That  is,  as  promised  above,  each  edge  of  the  residual  network,  or  residual  edge , 
can  admit  a  flow  that  is  greater  than  0.  Figure  26.4(a)  repeats  the  flow  network  G 
and  flow  /  of  Figure  26.1(b),  and  Figure  26.4(b)  shows  the  corresponding  residual 
network  G/.  The  edges  in  Ef  are  either  edges  in  E  or  their  reversals,  and  thus 

\Ef\  <2\E\  . 

Observe  that  the  residual  network  G/  is  similar  to  a  flow  network  with  capacities 
given  by  Cf.  It  does  not  satisfy  our  definition  of  a  flow  network  because  it  may 
contain  both  an  edge  ( u ,  y)  and  its  reversal  (y,n).  Other  than  this  difference,  a 
residual  network  has  the  same  properties  as  a  flow  network,  and  we  can  define  a 
flow  in  the  residual  network  as  one  that  satisfies  the  definition  of  a  flow,  but  with 
respect  to  capacities  cy  in  the  network  G/. 

A  flow  in  a  residual  network  provides  a  roadmap  for  adding  flow  to  the  original 
flow  network.  If  /  is  a  flow  in  G  and  /'  is  a  flow  in  the  corresponding  residual 
network  G/,  we  define  /  \  /',  the  augmentation  of  flow  /  by  /',  to  be  a  function 
from  V  x  V  to  M,  defined  by 


(/t/')0Lv)  = 


/(m,  y)  +  f'(u,  y)  —  fly,  u )  if  (; u ,  y)  e  E  , 
0  otherwise  . 


(26.4) 
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Figure  26.4  (a)  The  flow  network  G  and  flow  /  of  Figure  26.1(b).  (b)  The  residual  network  Gf 
with  augmenting  path  p  shaded;  its  residual  capacity  is  cf(p)  =  cf(v 2,1*3)  =  4.  Edges  with 
residual  capacity  equal  to  0,  such  as  (vi ,  V3),  are  not  shown,  a  convention  we  follow  in  the  remainder 
of  this  section,  (c)  The  flow  in  G  that  results  from  augmenting  along  path  p  by  its  residual  capacity  4. 
Edges  carrying  no  flow,  such  as  (i>3,  i>2),  are  labeled  only  by  their  capacity,  another  convention  we 
follow  throughout,  (d)  The  residual  network  induced  by  the  flow  in  (c). 


The  intuition  behind  this  definition  follows  the  definition  of  the  residual  network. 
We  increase  the  flow  on  (u,  v)  by  f'(u,v)  but  decrease  it  by  f'(v,u )  because 
pushing  flow  on  the  reverse  edge  in  the  residual  network  signifies  decreasing  the 
flow  in  the  original  network.  Pushing  flow  on  the  reverse  edge  in  the  residual 
network  is  also  known  as  cancellation.  For  example,  if  we  send  5  crates  of  hockey 
pucks  from  u  to  v  and  send  2  crates  from  v  to  u,  we  could  equivalently  (from  the 
perspective  of  the  final  result)  just  send  3  creates  from  u  to  v  and  none  from  v  to  u. 
Cancellation  of  this  type  is  crucial  for  any  maximum-flow  algorithm. 

Lemma  26.1 

Let  G  =  (V,  E)  be  a  flow  network  with  source  s  and  sink  t,  and  let  /  be  a  flow 
in  G.  Let  Gf  be  the  residual  network  of  G  induced  by  /,  and  let  /'  be  a  flow 
in  Gf.  Then  the  function  f  \  f  defined  in  equation  (26.4)  is  a  flow  in  G  with 
value  |/  f  f'\  =  I/I  +  |/'|. 

Proof  We  first  verify  that  f  \  f  obeys  the  capacity  constraint  for  each  edge  in  E 
and  flow  conservation  at  each  vertex  in  V  —  {i,  t}. 
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For  the  capacity  constraint,  first  observe  that  if  (u,v)  e  E,  then  cy  (v,u)  = 
f(u ,  v).  Therefore,  we  have  f'(v,u)  <  Cf(v,u)  =  f(u ,  v),  and  hence 

(/t /')(«> y)  =  f(u,v)  +  f'(u,v)  -  f(v,u)  (by  equation  (26.4)) 

>  f(u,v)  +  f'(u,v)  —  f(u,v)  (because  f'(v,u)<f(u,v)) 
=  f(u,v) 

:  0. 


In  addition, 


(/f/')(n,v) 


< 

< 


f{u,v)  +  f'(u,v )  -  f'(y,  u) 
f(u,v)  +  f(u,v) 

f(u , v)  +  C/(m, v) 

/(u,  v)  +  c(m,  v)  —  f(u,  V ) 
c(u ,  v)  . 


(by  equation  (26.4)) 

(because  flows  are  nonnegative) 
(capacity  constraint) 

(definition  of  cy) 


For  flow  conservation,  because  both  /  and  f  obey  flow  conservation,  we  have 
that  for  all  u  e  V  —  {s,  t }, 


E(/t/')0rv) 


XI  (/(m’  v )  +  y)  -  /'(y> u)) 

X  f(u' y)  +  X  /'(“> y) _  X  /'(y’ 

veV  veV  veV 

X  /(y’ m)  +  X  /'(y» u )  -  X  /'(“* y) 

veF  veF  veF 

X  (/(y> w)  +  /'(y, -  /'(m>  y)) 

veF 

X(/t/')(v,M), 

veV 


where  the  third  line  follows  from  the  second  by  flow  conservation. 

Finally,  we  compute  the  value  of  f  \  f  ■  Recall  that  we  disallow  antiparallel 
edges  in  G  (but  not  in  Gy),  and  hence  for  each  vertex  v  €  V,  we  know  that  there 
can  be  an  edge  (5,  v)  or  (v,s),  but  never  both.  We  define  V\  =  {v  :  (5,  v)  e  E} 
to  be  the  set  of  vertices  with  edges  from  s,  and  V2  =  {v  :  (v,s)  e  E}  to  be  the 
set  of  vertices  with  edges  to  s.  We  have  V\  U  V2  C  V  and,  because  we  disallow 
antiparallel  edges,  V\  n  V2  =  0.  We  now  compute 

l/t/'l  =  X(/t/,)(j.v)-X(/t/')(v,J) 

veF  veF 


=  X(/t/,)(^v)-X(/t/,)(v^). 

ueFi  veV2 


(26.5) 
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where  the  second  line  follows  because  (/  \  f')(w,x)  is  0  if  (w,x)  $  E.  We  now 
apply  the  definition  of  /  f  /'  to  equation  (26.5),  and  then  reorder  and  group  terms 
to  obtain 

l/t/'l 

=  Y  (/fa* y) +  f(s’ y)  -  //y’ /»  ~  E  (/(y’ 5 )  +  /'(y>  ■*)  -  y)) 

veKi  veV2 

=  E  /(5’ y) +  E  /'(j» y)  ~  E  /7(y* ^ 

v€Ki  very  ueiq 

-  £  /(v,  5)  -  Y  /'O',  *)  +  Y  /'(■*•  y) 

y  €  T^2  v  €  v  €  F2 

=  E/^-E/O'-o 

very  veV2 

+  Y  /'o* y)  +  Y  /'o» y)  -  E  /'O', s) -  E  /'O'.  ^ 

y€T^l  v€  V2  yGVj  v€  V2 

=  E/^y)-E/(y^)+  E  /'(5’y)_  E  //y>5)-  (26-6) 

y€V\  y  €  V2  V&V1UV2  v€V\UV2 

In  equation  (26.6),  we  can  extend  all  four  summations  to  sum  over  V,  since  each 
additional  term  has  value  0.  (Exercise  26.2-1  asks  you  to  prove  this  formally.)  We 
thus  have 

i/t/'i  =  E  /o- y)  -  E  /(y  ’  />  +  E  /'o> y)  -  E  />>  />  (26-7) 

vsV  veV  veV  veV 

=  l/l  +  l/'l  ■ 


Augmenting  paths 

Given  a  flow  network  G  =  (V.  E)  and  a  flow  /,  an  augmenting  path  p  is  a 
simple  path  from  s  to  t  in  the  residual  network  G/.  By  the  definition  of  the  resid¬ 
ual  network,  we  may  increase  the  flow  on  an  edge  (w,  v)  of  an  augmenting  path 
by  up  to  C/(u,  v)  without  violating  the  capacity  constraint  on  whichever  of  (u,  v) 
and  (v,  u)  is  in  the  original  flow  network  G. 

The  shaded  path  in  Figure  26.4(b)  is  an  augmenting  path.  Treating  the  residual 
network  G/  in  the  figure  as  a  flow  network,  we  can  increase  the  flow  through  each 
edge  of  this  path  by  up  to  4  units  without  violating  a  capacity  constraint,  since  the 
smallest  residual  capacity  on  this  path  is  c/(v2>  V3)  =  4.  We  call  the  maximum 
amount  by  which  we  can  increase  the  flow  on  each  edge  in  an  augmenting  path  p 
the  residual  capacity  of  p,  given  by 

cf(p)  =  min{c/(u,  v)  :  (u,  v)  is  on  p}  . 
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The  following  lemma,  whose  proof  we  leave  as  Exercise  26.2-7,  makes  the  above 
argument  more  precise. 


Lemma  26.2 

Let  G  =  (V.E)  be  a  flow  network,  let  /  be  a  flow  in  G,  and  let  p  be  an  augmenting 
path  in  G/.  Define  a  function  fp  :  V  x  V  — >•  M  by 


f  „v  _  )  cf(p)  if  (m,  v)  is  on  , 

'  P  I  0  otherwise  . 

Then,  /,  is  a  flow  in  G/  with  value  |/,|  =  c/(p)  >  0. 


(26.8) 


The  following  corollary  shows  that  if  we  augment  /  by  fp,  we  get  another  flow 
in  G  whose  value  is  closer  to  the  maximum.  Figure  26.4(c)  shows  the  result  of 
augmenting  the  flow  /  from  Figure  26.4(a)  by  the  flow  fp  in  Figure  26.4(b),  and 
Figure  26.4(d)  shows  the  ensuing  residual  network. 

Corollary  26.3 

Let  G  =  (V,  E)  be  a  flow  network,  let  /  be  a  flow  in  G,  and  let  p  be  an 
augmenting  path  in  G/.  Let  fp  be  defined  as  in  equation  (26.8),  and  suppose 
that  we  augment  /  by  fp.  Then  the  function  /  f  fp  is  a  flow  in  G  with  value 

\f\fp\  =  1/1  +  \fP\  >1/1- 

Proof  Immediate  from  Lemmas  26.1  and  26.2.  ■ 

Cuts  of  flow  networks 

The  Ford-Fulkerson  method  repeatedly  augments  the  flow  along  augmenting  paths 
until  it  has  found  a  maximum  flow.  How  do  we  know  that  when  the  algorithm 
terminates,  we  have  actually  found  a  maximum  flow?  The  max-flow  min-cut  theo¬ 
rem,  which  we  shall  prove  shortly,  tells  us  that  a  flow  is  maximum  if  and  only  if  its 
residual  network  contains  no  augmenting  path.  To  prove  this  theorem,  though,  we 
must  first  explore  the  notion  of  a  cut  of  a  flow  network. 

A  cut  ( S,T )  of  flow  network  G  =  (V,  E)  is  a  partition  of  V  into  S  and 
T  =  V  —  S  such  that  s  €  S  and  t  e  T .  (This  definition  is  similar  to  the  def¬ 
inition  of  “cut”  that  we  used  for  minimum  spanning  trees  in  Chapter  23,  except 
that  here  we  are  cutting  a  directed  graph  rather  than  an  undirected  graph,  and  we 
insist  that  s  e  S  and  t  €  T .)  If  /  is  a  flow,  then  the  net  flow  f(S,  T )  across  the 
cut  (S.T)  is  defined  to  be 

f(S,T)  =  EE  f(u.v)~  EE 

ueS  veT  ueS  veT 


(26.9) 
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Figure  26.5  A  cut  ( S,T )  in  the  flow  network  of  Figure  26.1(b),  where  S  =  {,s,v  1,1*2}  and 
T  =  {1*3, 1*4,1}.  The  vertices  in  S  are  black,  and  the  vertices  in  T  are  white.  The  net  flow 
across  (S,  T)  is  f(S,T)  =  19,  and  the  capacity  is  c(S.T)  =  26. 

The  capacity  of  the  cut  (5,  T  )  is 

c(S,T)  =  EE  c(m,v).  (26.10) 

ueS  veT 

A  minimum  cut  of  a  network  is  a  cut  whose  capacity  is  minimum  over  all  cuts  of 
the  network. 

The  asymmetry  between  the  definitions  of  flow  and  capacity  of  a  cut  is  inten¬ 
tional  and  important.  For  capacity,  we  count  only  the  capacities  of  edges  going 
from  S  to  T,  ignoring  edges  in  the  reverse  direction.  For  flow,  we  consider  the 
flow  going  from  S  to  T  minus  the  flow  going  in  the  reverse  direction  from  T  to  S. 
The  reason  for  this  difference  will  become  clear  later  in  this  section. 

Figure  26.5  shows  the  cut  ({5,  vt,  v2} » {v3,  v4,  t  \)  in  the  flow  network  of  Fig¬ 
ure  26.1(b).  The  net  flow  across  this  cut  is 

f(vu  v3)  + /(v2,  v4)  - /(V3,  v2)  =  12  +  11-4 

=  19, 


and  the  capacity  of  this  cut  is 

c(vltv3)  +  c(v2,  u4)  =  12  4-14 

=  26. 

The  following  lemma  shows  that,  for  a  given  flow  /,  the  net  flow  across  any  cut 
is  the  same,  and  it  equals  \f\,  the  value  of  the  flow. 

Lemma  26.4 

Let  /  be  a  flow  in  a  flow  network  G  with  source  s  and  sink  t,  and  let  (S,  T )  be  any 
cut  of  G.  Then  the  net  flow  across  (S’,  T)  is  f(S.  T)  =  \f\. 
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Proof  We  can  rewrite  the  flow-conservation  condition  for  any  node  u  e  V  —  {.s.  I } 
as 

J]/(m,v)  -  J]/Cv,n)  =  °  .  (26.11) 

veV  veV 

Taking  the  definition  of  |/|  from  equation  (26.1)  and  adding  the  left-hand  side  of 
equation  (26.1 1),  which  equals  0,  summed  over  all  vertices  in  S  —  {s},  gives 


\f\  =  J2f(s,v)-J2f(y^)  +  E  E  f(u,V )  -J2f(v,u) 


veV 


veV 


ueS-{s }  XveV 


veV 


Expanding  the  right-hand  summation  and  regrouping  terms  yields 

i/i  =  23 /^ v) - 23 /</’/>  +  23  23  23 /(y-M) 

veV  veV  ueS-{s }  veV  ueS-{s }  veV 


23  /(m-v))  ~23  f/(v’5)  +  23  /( 

ueS—{s}  )  veV  \  ueS—{s} 


23  23 /^y)- 23  23/^) 


veV  ueS 


vsV  usS 


Because  V  =  S  U  T  and  S  C\  T  =  0,  we  can  split  each  summation  over  V  into 
summations  over  S  and  T  to  obtain 


i/i 


EE  f(u,  v )  +  EE  f(u,v)  -  EE  f(y,u) 


vsS  ueS 


veT  ueS 


vsS  ueS 


2323/(M’y)_2323/(v’M) 

veT  ueS  veT  ueS 


2323/(v’w) 

veT  ueS 


+  ( 23  23  /(“• y)  -  23  23  /(y’ u) 


svs S  ueS 


vs S  ueS 


The  two  summations  within  the  parentheses  are  actually  the  same,  since  for  all 
vertices  x,  y  e  V,  the  term  f(x,  y )  appears  once  in  each  summation.  Hence,  these 
summations  cancel,  and  we  have 


i/i  =  23E/^y)-EE/(y’“) 

u&S  v&T  u€.S  v€.T 

=  f(S.T). 


A  corollary  to  Lemma  26.4  shows  how  we  can  use  cut  capacities  to  bound  the 
value  of  a  flow. 
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Corollary  26.5 

The  value  of  any  flow  /  in  a  flow  network  G  is  bounded  from  above  by  the  capacity 
of  any  cut  of  G. 

Proof  Let  ( S ,  T )  be  any  cut  of  G  and  let  /  be  any  flow.  By  Lemma  26.4  and  the 
capacity  constraint, 

I/I  =  f(S.T) 

ueS  veT  ueS  veT 

~  EE /(“’”) 

ueS  veT 

u&S  vsT 

=  c(S.T).  m 

Corollary  26.5  yields  the  immediate  consequence  that  the  value  of  a  maximum 
flow  in  a  network  is  bounded  from  above  by  the  capacity  of  a  minimum  cut  of 
the  network.  The  important  max-flow  min-cut  theorem,  which  we  now  state  and 
prove,  says  that  the  value  of  a  maximum  flow  is  in  fact  equal  to  the  capacity  of  a 
minimum  cut. 

Theorem  26.6  ( Max-flow  min-cut  theorem) 

If  /  is  a  flow  in  a  flow  network  G  =  (V,  E)  with  source  s  and  sink  t,  then  the 
following  conditions  are  equivalent: 

1.  /  is  a  maximum  flow  in  G. 

2.  The  residual  network  G/  contains  no  augmenting  paths. 

3.  I/I  =  c(S ,  T)  for  some  cut  (S',  T)  of  G. 

Proof  (1)  =3-  (2):  Suppose  for  the  sake  of  contradiction  that  /  is  a  maximum 
flow  in  G  but  that  G/  has  an  augmenting  path  p.  Then,  by  Corollary  26.3,  the 
flow  found  by  augmenting  /  by  fp,  where  fp  is  given  by  equation  (26.8),  is  a  flow 
in  G  with  value  strictly  greater  than  |  / 1 ,  contradicting  the  assumption  that  /  is  a 
maximum  flow. 

(2)  =>-  (3):  Suppose  that  G/  has  no  augmenting  path,  that  is,  that  Gy  contains 
no  path  from  s  to  t.  Define 

S  =  {v  e  V  :  there  exists  a  path  from  .v  to  v  in  G/  J 

and  T  =  V  —  S.  The  partition  (S,T)  is  a  cut:  we  have  s  €  S  trivially  and  l  f  S 
because  there  is  no  path  from  s  to  t  in  G/.  Now  consider  a  pair  of  vertices 
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u  €  S  and  v  e  T.  If  (u,v)  e  E,  we  must  have  f(u,v )  =  c(u,v),  since 
otherwise  (u,  v)  e  Ef,  which  would  place  v  in  set  S.  If  (v,u)  €  E,  we  must 
have  f(v,u )  =  0,  because  otherwise  Cf(u,v)  =  f(v,u )  would  be  positive  and 
we  would  have  (u,  v)  €  Ef,  which  would  place  v  in  S.  Of  course,  if  neither  (u,v) 
nor  (v,  u)  is  in  E,  then  f(u,  v)  =  f(v,  u)  =  0.  We  thus  have 

f(S,T)  =  EE  f(u,v)  -  EE  f(v,u) 

ueS  veT  veT  ueS 

=  EEc(m-v)-EEo 

ueS  veT  veT  ueS 

=  c(S,  T)  . 

By  Lemma  26.4,  therefore,  |/|  =  f(S,T)  =  c(S,  T ). 

(3)  =>■  (1):  By  Corollary  26.5,  \  f\<  c(S ,  T)  for  all  cuts  ( S ,  T).  The  condition 
\f\  =  c(S,T)  thus  implies  that  /  is  a  maximum  flow.  ■ 

The  basic  Ford-Fulkerson  algorithm 

In  each  iteration  of  the  Ford-Fulkerson  method,  we  find  some  augmenting  path  p 
and  use  p  to  modify  the  flow  /.  As  Lemma  26.2  and  Corollary  26.3  suggest,  we 
replace  /  by  /  f  fp,  obtaining  a  new  flow  whose  value  is  |/|  +  \fp\.  The  follow¬ 
ing  implementation  of  the  method  computes  the  maximum  flow  in  a  flow  network 
G  =  (V,E)  by  updating  the  flow  attribute  ( u,v).f  for  each  edge  (u.  v)  e  E) 
If  ( u,v )  /  E,  we  assume  implicitly  that  ( u,v).f  =  0.  We  also  assume  that  we 
are  given  the  capacities  c(u,v)  along  with  the  flow  network,  and  c(u,v)  =  0 
if  (u,v)  /  E.  We  compute  the  residual  capacity  c/(u,v)  in  accordance  with  the 
formula  (26.2).  The  expression  C/(p)  in  the  code  is  just  a  temporary  variable  that 
stores  the  residual  capacity  of  the  path  p. 

Ford-Fulkerson  (G,  5,  t) 

1  for  each  edge  ( u ,  v)  e  G.E 

2  (: u ,  v).f  =  0 

3  while  there  exists  a  path  p  from  .s  to  t  in  the  residual  network  G/ 

4  Cf(p)  =  min{c/(u,  v)  :  (u,  v)  is  in  p} 

5  for  each  edge  (u,  v)  in  p 

6  if  (u,v)  €  E 

7  (m,  v).f  =  (u,  v).f  +  cf(p) 

8  els e  =  (v,u).f  —  Cf(p) 


Recall  from  Section  22.1  that  we  represent  an  attribute  /  for  edge  (w.  v)  with  the  same  style  of 
notation  ( u,v).f  that  we  use  for  an  attribute  of  any  other  object. 
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The  Ford-Fulkerson  algorithm  simply  expands  on  the  FORD-FULKERSON- 
Method  pseudocode  given  earlier.  Figure  26.6  shows  the  result  of  each  iteration 
in  a  sample  run.  Lines  1-2  initialize  the  flow  /  to  0.  The  while  loop  of  lines  3-8 
repeatedly  finds  an  augmenting  path  p  in  G/  and  augments  flow  /  along  p  by 
the  residual  capacity  Cp(p).  Each  residual  edge  in  path  p  is  either  an  edge  in  the 
original  network  or  the  reversal  of  an  edge  in  the  original  network.  Lines  6-8 
update  the  flow  in  each  case  appropriately,  adding  flow  when  the  residual  edge  is 
an  original  edge  and  subtracting  it  otherwise.  When  no  augmenting  paths  exist,  the 
flow  /  is  a  maximum  flow. 

Analysis  of  Ford-Fulkerson 

The  running  time  of  FORD-FULKERSON  depends  on  how  we  find  the  augmenting 
path  p  in  line  3.  If  we  choose  it  poorly,  the  algorithm  might  not  even  terminate:  the 
value  of  the  flow  will  increase  with  successive  augmentations,  but  it  need  not  even 
converge  to  the  maximum  flow  value.2  If  we  find  the  augmenting  path  by  using  a 
breadth-first  search  (which  we  saw  in  Section  22.2),  however,  the  algorithm  runs  in 
polynomial  time.  Before  proving  this  result,  we  obtain  a  simple  bound  for  the  case 
in  which  we  choose  the  augmenting  path  arbitrarily  and  all  capacities  are  integers. 

In  practice,  the  maximum-flow  problem  often  arises  with  integral  capacities.  If 
the  capacities  are  rational  numbers,  we  can  apply  an  appropriate  scaling  transfor¬ 
mation  to  make  them  all  integral.  If  f*  denotes  a  maximum  flow  in  the  transformed 
network,  then  a  straightforward  implementation  of  FORD-FULKERSON  executes 
the  while  loop  of  lines  3-8  at  most  \f*  \  times,  since  the  flow  value  increases  by  at 
least  one  unit  in  each  iteration. 

We  can  perform  the  work  done  within  the  while  loop  efficiently  if  we  implement 
the  flow  network  G  =  (V,E)  with  the  right  data  structure  and  find  an  augmenting 
path  by  a  linear-time  algorithm.  Let  us  assume  that  we  keep  a  data  structure  cor¬ 
responding  to  a  directed  graph  G'  =  (F,  E '),  where  E'  =  {(u,v)  :  (u,  v)  e  E  or 
(v,  u)  €  E  |.  Edges  in  the  network  G  are  also  edges  in  G' ,  and  therefore  we  can 
easily  maintain  capacities  and  flows  in  this  data  structure.  Given  a  flow  /  on  G, 
the  edges  in  the  residual  network  G/  consist  of  all  edges  (it  ,  v)  of  G'  such  that 
Cf(u,v)  >  0,  where  c/  conforms  to  equation  (26.2).  The  time  to  find  a  path  in 
a  residual  network  is  therefore  0(V  +  E')  =  0(E)  if  we  use  either  depth-first 
search  or  breadth-first  search.  Each  iteration  of  the  while  loop  thus  takes  0(E) 
time,  as  does  the  initialization  in  lines  1-2,  making  the  total  running  time  of  the 
Ford-Fulkerson  algorithm  0(E  |/*|). 


2The  Ford  Fulkerson  method  might  fail  to  terminate  only  if  edge  capacities  are  irrational  numbers. 
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Figure  26.6  The  execution  of  the  basic  Ford  Fulkerson  algorithm,  (a)  (e)  Successive  iterations  of 
the  while  loop.  The  left  side  of  each  part  shows  the  residual  network  Gy  from  line  3  with  a  shaded 
augmenting  path  p.  The  right  side  of  each  part  shows  the  new  flow  /  that  results  from  augmenting  / 
by  fp.  The  residual  network  in  (a)  is  the  input  network  G. 

When  the  capacities  are  integral  and  the  optimal  flow  value  \f*\  is  small,  the 
running  time  of  the  Ford-Fulkerson  algorithm  is  good.  Figure  26.7(a)  shows  an  ex¬ 
ample  of  what  can  happen  on  a  simple  flow  network  for  which  |  /*  |  is  large.  A  max¬ 
imum  flow  in  this  network  has  value  2,000,000:  1,000,000  units  of  flow  traverse 
the  path  s  — ►  u  —*■  t,  and  another  1,000,000  units  traverse  the  path  s  — »  v  — »  t.  If 
the  first  augmenting  path  found  by  FORD-FULKERSON  is  j  ►  u  v  -»■  t,  shown 
in  Figure  26.7(a),  the  flow  has  value  1  after  the  first  iteration.  The  resulting  resid¬ 
ual  network  appears  in  Figure  26.7(b).  If  the  second  iteration  finds  the  augment¬ 
ing  path  s  — »•  v  — »•  u  — »•  f,  as  shown  in  Figure  26.7(b),  the  flow  then  has  value  2. 
Figure  26.7(c)  shows  the  resulting  residual  network.  We  can  continue,  choosing 
the  augmenting  path  s  — >  u  -*  v  — ¥  t  in  the  odd-numbered  iterations  and  the  aug¬ 
menting  path  s  — >  v  — y  u  — ►  t  in  the  even-numbered  iterations.  We  would  perform 
a  total  of  2,000,000  augmentations,  increasing  the  flow  value  by  only  1  unit  in  each. 
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Figure  26.6,  continued  (f)  The  residual  network  at  the  last  while  loop  test.  It  has  no  augmenting 
paths,  and  the  flow  /  shown  in  (e)  is  therefore  a  maximum  flow.  The  value  of  the  maximum  flow 
found  is  23. 

The  Edmonds-Karp  algorithm 

We  can  improve  the  bound  on  Ford-Fulkerson  by  finding  the  augmenting 
path  p  in  line  3  with  a  breadth-first  search.  That  is,  we  choose  the  augmenting 
path  as  a  shortest  path  from  s  to  t  in  the  residual  network,  where  each  edge  has 
unit  distance  (weight).  We  call  the  Ford-Fulkerson  method  so  implemented  the 
Edmonds-Karp  algorithm.  We  now  prove  that  the  Edmonds-Karp  algorithm  runs 
in  0(VE2)  time. 

The  analysis  depends  on  the  distances  to  vertices  in  the  residual  network  Gf. 
The  following  lemma  uses  the  notation  S/(u,v)  for  the  shortest-path  distance 
from  u  to  v  in  Gf,  where  each  edge  has  unit  distance. 

Lemma  26.7 

If  the  Edmonds-Karp  algorithm  is  run  on  a  flow  network  G  =  (  V,  E  )  with  source  s 
and  sink  t,  then  for  all  vertices  v  e  V  —  {s,t},  the  shortest-path  distance  Sf(s,  v) 
in  the  residual  network  G/  increases  monotonically  with  each  flow  augmentation. 
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(a) 


(b) 


(c) 


Figure  26.7  (a)  A  flow  network  for  which  Ford  Fulkerson  can  take  ®(E  |/*|)  time, 
where  /*  is  a  maximum  flow,  shown  here  with  |/*|  =  2,000,000.  The  shaded  path  is  an  aug 
menting  path  with  residual  capacity  1.  (b)  The  resulting  residual  network,  with  another  augmenting 
path  whose  residual  capacity  is  1.  (c)  The  resulting  residual  network. 

Proof  We  will  suppose  that  for  some  vertex  v  €  V  —  {5,  t},  there  is  a  flow  aug¬ 
mentation  that  causes  the  shortest-path  distance  from  s  to  v  to  decrease,  and  then 
we  will  derive  a  contradiction.  Let  /  be  the  flow  just  before  the  first  augmentation 
that  decreases  some  shortest-path  distance,  and  let  /'  be  the  flow  just  afterward. 
Let  v  be  the  vertex  with  the  minimum  8f>(s,  v)  whose  distance  was  decreased  by 
the  augmentation,  so  that  8f(s,  v)  <  8/(s,  v).  Let  p  =  s  ^  u  — >  vbea  shortest 
path  from  s  to  v  in  G/>,  so  that  (u.  u)  G  E/>  and 

8f,(s,  u)  =  8f'(s,  v)  —  1  .  (26.12) 

Because  of  how  we  chose  v,  we  know  that  the  distance  of  vertex  u  from  the  source  s 
did  not  decrease,  i.e., 

8f>(s,u)  >  8f(s,u)  .  (26.13) 

We  claim  that  (u,  v)  $  Ef.  Why?  If  we  had  (u,  v)  G  Ef,  then  we  would  also  have 
8/(s,  v)  <  S/(s,  u)  +  1  (by  Lemma  24. 10,  the  triangle  inequality) 

<  8/'(s,u)  +  l  (by  inequality  (26.13)) 

=  8f'(s,  v)  (by  equation  (26.12))  , 

which  contradicts  our  assumption  that  8/>(s,  v)  <  8/(s,  v). 

How  can  we  have  (u,v)  f  Ef  and  (w,v)  €  £//?  The  augmentation  must 
have  increased  the  flow  from  v  to  u.  The  Edmonds-Karp  algorithm  always  aug¬ 
ments  flow  along  shortest  paths,  and  therefore  the  shortest  path  from  s  to  u  in  G/ 
has  (v,  u)  as  its  last  edge.  Therefore, 

(S/Cs,  v)  =  8f(s,  u)  —  1 

<  8fr(s,u )  —  1  (by  inequality  (26.13)) 

=  8f(s,  v)  —  2  (by  equation  (26. 12))  , 
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which  contradicts  our  assumption  that  8p(s,  v)  <  8f(s,  v).  We  conclude  that  our 
assumption  that  such  a  vertex  v  exists  is  incorrect.  ■ 

The  next  theorem  bounds  the  number  of  iterations  of  the  Edmonds-Karp  algo¬ 
rithm. 

Theorem  26.8 

If  the  Edmonds-Karp  algorithm  is  run  on  a  flow  network  G  =  (V.  E  )  with  source  .s- 
and  sink  t,  then  the  total  number  of  flow  augmentations  performed  by  the  algorithm 
is  0(VE). 

Proof  We  say  that  an  edge  (w,  v)  in  a  residual  network  G/  is  critical  on  an  aug¬ 
menting  path  p  if  the  residual  capacity  of  p  is  the  residual  capacity  of  (u,  v),  that 
is,  if  C/(p )  =  CfiiF  v).  After  we  have  augmented  flow  along  an  augmenting  path, 
any  critical  edge  on  the  path  disappears  from  the  residual  network.  Moreover,  at 
least  one  edge  on  any  augmenting  path  must  be  critical.  We  will  show  that  each  of 
the  \E  \  edges  can  become  critical  at  most  \  V\  / 2  times. 

Let  u  and  v  be  vertices  in  V  that  are  connected  by  an  edge  in  E.  Since  augment¬ 
ing  paths  are  shortest  paths,  when  (n,  v)  is  critical  for  the  first  time,  we  have 

8f(s,  v)  =  8f(s ,  u)  - 1-1. 

Once  the  flow  is  augmented,  the  edge  ( u ,  v  )  disappears  from  the  residual  network. 
It  cannot  reappear-  later  on  another  augmenting  path  until  after  the  flow  from  it  to  v 
is  decreased,  which  occurs  only  if  (v,u)  appears  on  an  augmenting  path.  If  /'  is 
the  flow  in  G  when  this  event  occurs,  then  we  have 

8f  (s,  u)  =  8f(s,  v)  +  1  . 

Since  8/(s,  v)  <  8f(s,  v)  by  Lemma  26.7 ,  we  have 

8f/{s,u )  =  8f(s,  v)  +  1 
>  8f(s,  v)  +  1 
=  8f(s,u )  +  2  . 

Consequently,  from  the  time  (u,v)  becomes  critical  to  the  time  when  it  next 
becomes  critical,  the  distance  of  u  from  the  source  increases  by  at  least  2.  The 
distance  of  u  from  the  source  is  initially  at  least  0.  The  intermediate  vertices  on  a 
shortest  path  from  s  to  u  cannot  contain  s,  u,  or  t  (since  (u,  v)  on  an  augmenting 
path  implies  that  u  f  t).  Therefore,  until  u  becomes  unreachable  from  the  source, 
if  ever,  its  distance  is  at  most  |  V\  —  2.  Thus,  after  the  first  time  that  (u.v)  becomes 
critical,  it  can  become  critical  at  most  (|F|  —  2)/2  =  |  K  |  / 2  —  1  times  more,  for  a 
total  of  at  most  \  V\  /2  times.  Since  there  are  0(  E)  pairs  of  vertices  that  can  have  an 
edge  between  them  in  a  residual  network,  the  total  number  of  critical  edges  during 
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the  entire  execution  of  the  Edmonds-Karp  algorithm  is  O ( VE ) .  Each  augmenting 
path  has  at  least  one  critical  edge,  and  hence  the  theorem  follows.  ■ 

Because  we  can  implement  each  iteration  of  Ford-Fulkerson  in  0(E)  time 
when  we  find  the  augmenting  path  by  breadth-first  search,  the  total  running  time  of 
the  Edmonds-Karp  algorithm  is  0(VE2).  We  shall  see  that  push-relabel  algorithms 
can  yield  even  better  bounds.  The  algorithm  of  Section  26.4  gives  a  method  for 
achieving  an  0(V2E)  running  time,  which  forms  the  basis  for  the  0(I/3)-time 
algorithm  of  Section  26.5. 

Exercises 


26.2-1 

Prove  that  the  summations  in  equation  (26.6)  equal  the  summations  in  equa¬ 
tion  (26.7). 


26.2-2 

In  Figure  26.1(b),  what  is  the  flow  across  the  cut  ({5,  v2,  u4}  ,  { Vj ,  v3,t})?  What  is 
the  capacity  of  this  cut? 


26.2-3 

Show  the  execution  of  the  Edmonds-Karp  algorithm  on  the  flow  network  of  Fig¬ 
ure  26.1(a). 


26.2-4 

In  the  example  of  Figure  26.6,  what  is  the  minimum  cut  corresponding  to  the  max¬ 
imum  flow  shown?  Of  the  augmenting  paths  appealing  in  the  example,  which  one 
cancels  flow? 


26.2-5 

Recall  that  the  construction  in  Section  26.1  that  converts  a  flow  network  with  mul¬ 
tiple  sources  and  sinks  into  a  single-source,  single-sink  network  adds  edges  with 
infinite  capacity.  Prove  that  any  flow  in  the  resulting  network  has  a  finite  value 
if  the  edges  of  the  original  network  with  multiple  sources  and  sinks  have  finite 
capacity. 


26.2-6 

Suppose  that  each  source  .?,■  in  a  flow  network  with  multiple  sources  and  sinks 
produces  exactly  units  of  flow,  so  that  f(Si,v)  =  p,.  Suppose  also 

that  each  sink  tj  consumes  exactly  qj  units,  so  that  / (v,  tj)  =  qj,  where 

Pi  =  <?./•  Show  how  to  convert  the  problem  of  finding  a  flow  /  that  obeys 
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these  additional  constraints  into  the  problem  of  finding  a  maximum  flow  in  a  single¬ 
source,  single-sink  flow  network. 


26.2-7 

Prove  Lemma  26.2. 


26.2-8 

Suppose  that  we  redefine  the  residual  network  to  disallow  edges  into  s.  Argue  that 
the  procedure  Ford-Fulkerson  still  correctly  computes  a  maximum  flow. 


26.2- 9 

Suppose  that  both  /  and  f  are  flows  in  a  network  G  and  we  compute  flow  /  f  /'. 
Does  the  augmented  flow  satisfy  the  flow  conservation  property?  Does  it  satisfy 
the  capacity  constraint? 

26.2- 10 

Show  how  to  find  a  maximum  flow  in  a  network  G  =  (V.  E)  by  a  sequence  of  at 
most  \  E\  augmenting  paths.  {Hint:  Determine  the  paths  after  finding  the  maximum 
flow.) 

26.2-11 

The  edge  connectivity  of  an  undirected  graph  is  the  minimum  number  k  of  edges 
that  must  be  removed  to  disconnect  the  graph.  For  example,  the  edge  connectivity 
of  a  tree  is  1,  and  the  edge  connectivity  of  a  cyclic  chain  of  vertices  is  2.  Show 
how  to  determine  the  edge  connectivity  of  an  undirected  graph  G  =  (V,  E)  by 
running  a  maximum-flow  algorithm  on  at  most  \V\  flow  networks,  each  having 
0(V )  vertices  and  0(E)  edges. 

26.2-12 

Suppose  that  you  are  given  a  flow  network  G,  and  G  has  edges  entering  the 
source  s.  Let  /  be  a  flow  in  G  in  which  one  of  the  edges  (v,  s)  entering  the  source 
has  f(v,s)  =  1.  Prove  that  there  must  exist  another  flow  /'  with  f'(v,s)  =  0 
such  that  |/|  =  |/'|.  Give  an  G(£')-time  algorithm  to  compute  /',  given  /,  and 
assuming  that  all  edge  capacities  are  integers. 

26.2-13 

Suppose  that  you  wish  to  find,  among  all  minimum  cuts  in  a  flow  network  G  with 
integral  capacities,  one  that  contains  the  smallest  number  of  edges.  Show  how  to 
modify  the  capacities  of  G  to  create  a  new  flow  network  G'  in  which  any  minimum 
cut  in  G'  is  a  minimum  cut  with  the  smallest  number  of  edges  in  G. 
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26.3  Maximum  bipartite  matching 

Some  combinatorial  problems  can  easily  be  cast  as  maximum-flow  problems.  The 
multiple-source,  multiple-sink  maximum-flow  problem  from  Section  26. 1  gave  us 
one  example.  Some  other  combinatorial  problems  seem  on  the  surface  to  have  little 
to  do  with  flow  networks,  but  can  in  fact  be  reduced  to  maximum-flow  problems. 
This  section  presents  one  such  problem:  finding  a  maximum  matching  in  a  bipartite 
graph.  In  order  to  solve  this  problem,  we  shall  take  advantage  of  an  integrality 
property  provided  by  the  Ford-Fulkerson  method.  We  shall  also  see  how  to  use 
the  Ford-Fulkerson  method  to  solve  the  maximum-bipartite-matching  problem  on 
a  graph  G  =  (V.  E)  in  O ( VE )  time. 

The  maximum-bipartite-matching  problem 

Given  an  undirected  graph  G  =  (V,  E),  a  matching  is  a  subset  of  edges  MCE 
such  that  for  all  vertices  v  e  F,  at  most  one  edge  of  M  is  incident  on  v.  We 
say  that  a  vertex  v  e  F  is  matched  by  the  matching  M  if  some  edge  in  M  is 
incident  on  v;  otherwise,  v  is  unmatched.  A  maximum  matching  is  a  matching 
of  maximum  cardinality,  that  is,  a  matching  M  such  that  for  any  matching  M', 
we  have  \M\  >  \M'\.  In  this  section,  we  shall  restrict  our  attention  to  finding 
maximum  matchings  in  bipartite  graphs:  graphs  in  which  the  vertex  set  can  be 
partitioned  into  V  —  L  U  R,  where  L  and  R  are  disjoint  and  all  edges  in  E 
go  between  L  and  R.  We  further  assume  that  every  vertex  in  V  has  at  least  one 
incident  edge.  Figure  26.8  illustrates  the  notion  of  a  matching  in  a  bipartite  graph. 

The  problem  of  finding  a  maximum  matching  in  a  bipartite  graph  has  many 
practical  applications.  As  an  example,  we  might  consider  matching  a  set  L  of  ma¬ 
chines  with  a  set  R  of  tasks  to  be  performed  simultaneously.  We  take  the  presence 
of  edge  (u,  v)  in  E  to  mean  that  a  particular  machine  u  e  L  is  capable  of  per¬ 
forming  a  particular  task  v  €  R.  A  maximum  matching  provides  work  for  as  many 
machines  as  possible. 

Finding  a  maximum  bipartite  matching 

We  can  use  the  Ford-Fulkerson  method  to  find  a  maximum  matching  in  an  undi¬ 
rected  bipartite  graph  G  =  (F,  E)  in  time  polynomial  in  |  F  |  and  \E\.  The  trick  is 
to  construct  a  flow  network  in  which  flows  correspond  to  matchings,  as  shown  in 
Figure  26.8(c).  We  define  the  corresponding  flow  network  G'  =  ( F\  E')  for  the 
bipartite  graph  G  as  follows.  We  let  the  source  s  and  sink  t  be  new  vertices  not 
in  F,  and  we  let  V'  =  F  U  {5,  t }.  If  the  vertex  partition  of  G  is  F  =  L  U  R,  the 
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Figure  26.8  A  bipartite  graph  G  =  (V.  E)  with  vertex  partition  V  =  L  U  R.  (a)  A  matching 
with  cardinality  2,  indicated  by  shaded  edges,  (b)  A  maximum  matching  with  cardinality  3.  (c)  The 
corresponding  flow  network  G'  with  a  maximum  flow  shown.  Each  edge  has  unit  capacity.  Shaded 
edges  have  a  flow  of  1,  and  all  other  edges  carry  no  flow.  The  shaded  edges  from  L  to  R  correspond 
to  those  in  the  maximum  matching  from  (b). 

directed  edges  of  G'  are  the  edges  of  E,  directed  from  L  to  R,  along  with  |F|  new 
directed  edges: 

E'  =  {($,  w) :  u  e  L}  U  {(w,  v) :  (w,  v)  e  E}  U  {(v,  t) :  v  e  R)  . 

To  complete  the  construction,  we  assign  unit  capacity  to  each  edge  in  E'.  Since 
each  vertex  in  V  has  at  least  one  incident  edge,  |£'|  >  \  V\/2.  Thus,  |£|  <  \E'\  = 
|£|  +  \V\  <  3|£|,and  so  \E'\  =  0(E). 

The  following  lemma  shows  that  a  matching  in  G  corresponds  directly  to  a  flow 
in  G’s  corresponding  flow  network  G' .  We  say  that  a  flow  /  on  a  flow  network 
G  =  (V,  E)  is  integer-valued  if  f(u,  u)  is  an  integer  for  all  ( u ,  v)  €  V  x  V. 

Lemma  26.9 

Let  G  =  (V,E)  be  a  bipartite  graph  with  vertex  partition  V  =  L  U  R,  and  let 
G'  =  (V' ,  E')  be  its  corresponding  flow  network.  If  M  is  a  matching  in  G,  then 
there  is  an  integer- valued  flow  /  in  G'  with  value  |/|  =  |M|.  Conversely,  if  / 
is  an  integer-valued  flow  in  G',  then  there  is  a  matching  M  in  G  with  cardinality 
|M|  =  |/|. 

Proof  We  first  show  that  a  matching  M  in  G  corresponds  to  an  integer- valued 
flow  /  in  G'.  Define  /  as  follows.  If  (w,v)  €  M,  then  f(s,u)  =  f(u,v)  = 
f(v,  t)  =  1.  For  all  other  edges  (m,  v)  €  E',  we  define  f(u,  v)  =  0.  It  is  simple 
to  verily  that  /  satisfies  the  capacity  constraint  and  flow  conservation. 
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Intuitively,  each  edge  (u.v)  e  M  corresponds  to  one  unit  of  flow  in  G'  that 
traverses  the  path  s  — >■  u  — >•  v  — t.  Moreover,  the  paths  induced  by  edges  in  M 
are  vertex-disjoint,  except  for  s  and  t.  The  net  flow  across  cut  (L  U  j.s'J .  R  U  {/  }) 
is  equal  to  \M\\  thus,  by  Lemma  26.4,  the  value  of  the  flow  is  |/|  =  \M\. 

To  prove  the  converse,  let  /  be  an  integer-valued  flow  in  G',  and  let 

M  =  {(u.  v)  .  u  e  L,  v  e  R,  and  / (u  ,v)>0}  . 

Each  vertex  u  e  L  has  only  one  entering  edge,  namely  ( s ,  u),  and  its  capacity 
is  1 .  Thus,  each  u  €  L  has  at  most  one  unit  of  flow  entering  it,  and  if  one  unit  of 
flow  does  enter,  by  flow  conservation,  one  unit  of  flow  must  leave.  Furthermore, 
since  /  is  integer-valued,  for  each  u  e  L,  the  one  unit  of  flow  can  enter  on  at  most 
one  edge  and  can  leave  on  at  most  one  edge.  Thus,  one  unit  of  flow  enters  u  if  and 
only  if  there  is  exactly  one  vertex  v  e  R  such  that  f(u,v )  =  1,  and  at  most  one 
edge  leaving  each  u  e  L  carries  positive  flow.  A  symmetric  argument  applies  to 
each  v  €  R.  The  set  M  is  therefore  a  matching. 

To  see  that  \M\  =  |/|,  observe  that  for  every  matched  vertex  w  e  L,  we  have 
f(s,  u)  =  1,  and  for  every  edge  (u,  v)  €  E  —  M,  we  have  f(u,  v)  =  0.  Conse¬ 
quently,  f(L  U  {5}  ,  R  U  {t}),  the  net  flow  across  cut  (LU{i},J?U  {?}),  is  equal 
to  | M\.  Applying  Lemma  26.4,  we  have  that  |  f  \  =  f(L  U  {5}  ,  R  U  {? })  =  \M\.  m 

Based  on  Lemma  26.9,  we  would  like  to  conclude  that  a  maximum  matching 
in  a  bipartite  graph  G  corresponds  to  a  maximum  flow  in  its  corresponding  flow 
network  G',  and  we  can  therefore  compute  a  maximum  matching  in  G  by  running 
a  maximum-flow  algorithm  on  G'.  The  only  hitch  in  this  reasoning  is  that  the 
maxi  mum -flow  algorithm  might  return  a  flow  in  G'  for  which  some  f(u,v)  is 
not  an  integer,  even  though  the  flow  value  \  f\  must  be  an  integer.  The  following 
theorem  shows  that  if  we  use  the  Ford-Fulkerson  method,  this  difficulty  cannot 
arise. 

Theorem  26.10  ( Integrality  theorem) 

If  the  capacity  function  c  takes  on  only  integral  values,  then  the  maximum  flow  / 
produced  by  the  Ford-Fulkerson  method  has  the  property  that  |/|  is  an  integer. 
Moreover,  for  all  vertices  u  and  v,  the  value  of  f[u.  v)  is  an  integer. 

Proof  The  proof  is  by  induction  on  the  number  of  iterations.  We  leave  it  as 
Exercise  26.3-2.  ■ 

We  can  now  prove  the  following  corollary  to  Lemma  26.9. 
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Corollary  26.11 

The  cardinality  of  a  maximum  matching  M  in  a  bipartite  graph  G  equals  the  value 
of  a  maximum  flow  /  in  its  corresponding  flow  network  G'. 

Proof  We  use  the  nomenclature  from  Lemma  26.9.  Suppose  that  M  is  a  max¬ 
imum  matching  in  G  and  that  the  corresponding  flow  /  in  G'  is  not  maximum. 
Then  there  is  a  maximum  flow  f  in  G'  such  that  |/'|  >  |/|.  Since  the  ca¬ 
pacities  in  G'  are  integer- valued,  by  Theorem  26.10,  we  can  assume  that  f  is 
integer-valued.  Thus,  f  corresponds  to  a  matching  M'  in  G  with  cardinality 
|M'|  =  \f'\  >  |  / 1  =  |  M  | ,  contradicting  our  assumption  that  M  is  a  maximum 
matching.  In  a  similar-  manner,  we  can  show  that  if  /  is  a  maximum  flow  in  G',  its 
corresponding  matching  is  a  maximum  matching  on  G.  ■ 

Thus,  given  a  bipartite  undirected  graph  G,  we  can  find  a  maximum  matching  by 
creating  the  flow  network  G',  running  the  Ford-Fulkerson  method,  and  directly  ob¬ 
taining  a  maximum  matching  M  from  the  integer-valued  maximum  flow  /  found. 
Since  any  matching  in  a  bipartite  graph  has  cardinality  at  most  min(L,  R)  =  O(V), 
the  value  of  the  maximum  flow  in  G'  is  O(V).  We  can  therefore  find  a  maximum 
matching  in  a  bipartite  graph  in  time  O(VE')  =  0(VE),  since  \E'\  =  0(F). 

Exercises 


26.3-1 

Run  the  Ford-Fulkerson  algorithm  on  the  flow  network  in  Figure  26.8(c)  and  show 
the  residual  network  after  each  flow  augmentation.  Number  the  vertices  in  L  top 
to  bottom  from  1  to  5  and  in  R  top  to  bottom  from  6  to  9.  For  each  iteration,  pick 
the  augmenting  path  that  is  lexicographically  smallest. 


26.3-2 

Prove  Theorem  26.10. 


26.3- 3 

Let  G  =  (V.  E)  be  a  bipartite  graph  with  vertex  partition  V  =  L  U  R,  and  let  G' 
be  its  corresponding  flow  network.  Give  a  good  upper  bound  on  the  length  of  any 
augmenting  path  found  in  G'  during  the  execution  of  Ford-Fulkerson. 

26.3- 4  * 

A  perfect  matching  is  a  matching  in  which  every  vertex  is  matched.  Let  G  = 
(V.  E)  be  an  undirected  bipartite  graph  with  vertex  partition  V  =  L  U  R,  where 
\L\  =  |  R|.  For  any  icf,  define  the  neighborhood  of  X  as 

N(X)  =  {y  e  V  :  (x,  y)  e  E  for  some  x  e  X}  , 
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that  is,  the  set  of  vertices  adjacent  to  some  member  of  X .  Prove  Hall’s  theorem : 
there  exists  a  perfect  matching  in  G  if  and  only  i  f  |  A  \  <  |/V(/1)|  for  every  subset 
ACL. 

26.3-5  * 

We  say  that  a  bipartite  graph  G  =  (V,  E),  where  V  =  L  U  R,  is  d -regular  if  every 
vertex  pef  has  degree  exactly  d .  Every  d  -regular  bipartite  graph  has  |  L  |  =  \R\. 
Prove  that  every  d -regular  bipartite  graph  has  a  matching  of  cardinality  |L|  by 
arguing  that  a  minimum  cut  of  the  corresponding  flow  network  has  capacity  |  L  \ . 


★  26.4  Push-relabel  algorithms 


In  this  section,  we  present  the  “push-relabel”  approach  to  computing  maximum 
flows.  To  date,  many  of  the  asymptotically  fastest  maximum-flow  algorithms  are 
push-relabel  algorithms,  and  the  fastest  actual  implementations  of  maximum-flow 
algorithms  are  based  on  the  push-relabel  method.  Push-relabel  methods  also  effi¬ 
ciently  solve  other  flow  problems,  such  as  the  minimum-cost  flow  problem.  This 
section  introduces  Goldberg’s  “generic”  maximum-flow  algorithm,  which  has  a 
simple  implementation  that  runs  in  0(V2E )  time,  thereby  improving  upon  the 
0(VE2)  bound  of  the  Edmonds-Karp  algorithm.  Section  26.5  refines  the  generic 
algorithm  to  obtain  another  push-relabel  algorithm  that  runs  in  0(V 3)  time. 

Push-relabel  algorithms  work  in  a  more  localized  manner  than  the  Ford- 
Fulkerson  method.  Rather  than  examine  the  entire  residual  network  to  find  an  aug¬ 
menting  path,  push-relabel  algorithms  work  on  one  vertex  at  a  time,  looking  only 
at  the  vertex’s  neighbors  in  the  residual  network.  Furthermore,  unlike  the  Ford- 
Fulkerson  method,  push-relabel  algorithms  do  not  maintain  the  flow-conservation 
property  throughout  their  execution.  They  do,  however,  maintain  a  preflow,  which 
is  a  function  /  :  V  x  V  —*■  R  that  satisfies  the  capacity  constraint  and  the  following 
relaxation  of  flow  conservation: 


E/O'.’O  -J2f(u’v)  - 0 


for  all  vertices  u  e  V  —  {5}.  That  is,  the  flow  into  a  vertex  may  exceed  the  flow 
out.  We  call  the  quantity 


(26.14) 


the  excess  flow  into  vertex  u.  The  excess  at  a  vertex  is  the  amount  by  which  the 
flow  in  exceeds  the  flow  out.  We  say  that  a  vertex  u  e  V  —  {5,  t}  is  overflowing  if 
e(u)  >  0. 


26.4  Push  relabel  algorithms 


737 


We  shall  begin  this  section  by  describing  the  intuition  behind  the  push-relabel 
method.  We  shall  then  investigate  the  two  operations  employed  by  the  method: 
“pushing”  preflow  and  “relabeling”  a  vertex.  Finally,  we  shall  present  a  generic 
push-relabel  algorithm  and  analyze  its  correctness  and  running  time. 

Intuition 

You  can  understand  the  intuition  behind  the  push-relabel  method  in  terms  of  fluid 
flows:  we  consider  a  flow  network  G  =  (V,  E )  to  be  a  system  of  interconnected 
pipes  of  given  capacities.  Applying  this  analogy  to  the  Ford-Fulkerson  method, 
we  might  say  that  each  augmenting  path  in  the  network  gives  rise  to  an  additional 
stream  of  fluid,  with  no  branch  points,  flowing  from  the  source  to  the  sink.  The 
Ford-Fulkerson  method  iteratively  adds  more  streams  of  flow  until  no  more  can  be 
added. 

The  generic  push-relabel  algorithm  has  a  rather  different  intuition.  As  before, 
directed  edges  correspond  to  pipes.  Vertices,  which  are  pipe  junctions,  have  two 
interesting  properties.  First,  to  accommodate  excess  flow,  each  vertex  has  an  out¬ 
flow  pipe  leading  to  an  arbitrarily  large  reservoir  that  can  accumulate  fluid.  Second, 
each  vertex,  its  reservoir,  and  all  its  pipe  connections  sit  on  a  platform  whose  height 
increases  as  the  algorithm  progresses. 

Vertex  heights  determine  how  flow  is  pushed:  we  push  flow  only  downhill,  that 
is,  from  a  higher  vertex  to  a  lower  vertex.  The  flow  from  a  lower  vertex  to  a  higher 
vertex  may  be  positive,  but  operations  that  push  flow  push  it  only  downhill.  We 
fix  the  height  of  the  source  at  |  V  |  and  the  height  of  the  sink  at  0.  All  other  vertex 
heights  start  at  0  and  increase  with  time.  The  algorithm  first  sends  as  much  flow  as 
possible  downhill  from  the  source  toward  the  sink.  The  amount  it  sends  is  exactly 
enough  to  fill  each  outgoing  pipe  from  the  source  to  capacity;  that  is,  it  sends  the 
capacity  of  the  cut  (,s\  V  —  {5}).  When  flow  first  enters  an  intermediate  vertex,  it 
collects  in  the  vertex’s  reservoir.  From  there,  we  eventually  push  it  downhill. 

We  may  eventually  find  that  the  only  pipes  that  leave  a  vertex  u  and  are  not 
already  saturated  with  flow  connect  to  vertices  that  are  on  the  same  level  as  u  or 
are  uphill  from  u.  In  this  case,  to  rid  an  overflowing  vertex  u  of  its  excess  flow,  we 
must  increase  its  height— an  operation  called  “relabeling”  vertex  u.  We  increase 
its  height  to  one  unit  more  than  the  height  of  the  lowest  of  its  neighbors  to  which 
it  has  an  unsaturated  pipe.  After  a  vertex  is  relabeled,  therefore,  it  has  at  least  one 
outgoing  pipe  through  which  we  can  push  more  flow. 

Eventually,  all  the  flow  that  can  possibly  get  through  to  the  sink  has  arrived  there. 
No  more  can  arrive,  because  the  pipes  obey  the  capacity  constraints;  the  amount  of 
flow  across  any  cut  is  still  limited  by  the  capacity  of  the  cut.  To  make  the  preflow 
a  “legal”  flow,  the  algorithm  then  sends  the  excess  collected  in  the  reservoirs  of 
overflowing  vertices  back  to  the  source  by  continuing  to  relabel  vertices  to  above 


738 


Chapter  26  Maximum  Flow 


the  fixed  height  \  V\  of  the  source.  As  we  shall  see,  once  we  have  emptied  all  the 
reservoirs,  the  preflow  is  not  only  a  “legal”  flow,  it  is  also  a  maximum  flow. 

The  basic  operations 

From  the  preceding  discussion,  we  see  that  a  push-relabel  algorithm  performs  two 
basic  operations:  pushing  flow  excess  from  a  vertex  to  one  of  its  neighbors  and 
relabeling  a  vertex.  The  situations  in  which  these  operations  apply  depend  on  the 
heights  of  vertices,  which  we  now  define  precisely. 

Let  G  =  (V.  E)  be  a  flow  network  with  source  .s-  and  sink  t,  and  let  /  be  a 
preflow  in  G.  A  function  h  :  V  — »  N  is  a  height  function 3  if  h(s)  =  \V\, 
h(t )  =  0,  and 

h(u)  <  h(v)  +  1 

for  every  residual  edge  (u,  v)  €  Ef.  We  immediately  obtain  the  following  lemma. 

Lemma  26.12 

Let  G  =  (V.  E)  be  a  flow  network,  let  /  be  a  preflow  in  G,  and  let  h  be  a  height 
function  on  V.  For  any  two  vertices  u.  v  €  F,  if  h(u)  >  h(v)  +  1,  then  (u,  v)  is 
not  an  edge  in  the  residual  network.  ■ 


The  push  operation 

The  basic  operation  PuSH(u,  v)  applies  if  u  is  an  overflowing  vertex,  c/(w,  v)  >  0, 
and  h{u)  =  h(\>)  +  1 .  The  pseudocode  below  updates  the  preflow  /  and  the  excess 
flows  for  u  and  v.  It  assumes  that  we  can  compute  residual  capacity  c/(u ,  v)  in 
constant  time  given  c  and  /.  We  maintain  the  excess  flow  stored  at  a  vertex  u  as 
the  attribute  u.e  and  the  height  of  u  as  the  attribute  u.h.  The  expression  A f(u,  v) 
is  a  temporary  variable  that  stores  the  amount  of  flow  that  we  can  push  from  u  to  v. 


3In  the  literature,  a  height  function  is  typically  called  a  “distance  function,”  and  the  height  of  a  vertex 
is  called  a  “distance  label.”  We  use  the  term  “height”  because  it  is  more  suggestive  of  the  intuition 
behind  the  algorithm.  We  retain  the  use  of  the  term  “relabel”  to  refer  to  the  operation  that  increases 
the  height  of  a  vertex.  The  height  of  a  vertex  is  related  to  its  distance  from  the  sink  t,  as  would  be 
found  in  a  breadth  first  search  of  the  transpose  GT. 
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Push(m,  v) 

1  //  Applies  when:  u  is  overflowing,  C/(u,  i>)  >  0,  and  u.h  =  v.h  +  1. 

2  //  Action:  Push  A f(u,  v)  =  min (u.e,  c/(u ,  v))  units  of  flow  from  u  to  v. 

3  A  f(u,v)  =  min  (u.e,  C/(u,  v)) 

4  if  (u,  v)  e  E 

5  (u,  v).f  —  (u,  v).f  +  Af(u,  v) 

6  else  (v,  u).f  =  (v,  u).f  —  A/(u,  v) 

7  u.e  =  u.e  —  Af(u.v) 

8  v.e  =  u.e  +  A/(u,  v) 

The  code  for  PUSH  operates  as  follows.  Because  vertex  u  has  a  positive  excess  u.e 
and  the  residual  capacity  of  (u,  v)  is  positive,  we  can  increase  the  flow  from  u  to  v 
by  A/(u,v)  =  min(u.e,  c/(u,  v))  without  causing  u.e  to  become  negative  or  the 
capacity  c(u,  v)  to  be  exceeded.  Line  3  computes  the  value  A/(u,  v),  and  lines  4-6 
update  /.  Line  5  increases  the  flow  on  edge  (u,  v),  because  we  are  pushing  flow 
over  a  residual  edge  that  is  also  an  original  edge.  Line  6  decreases  the  flow  on 
edge  (v,u),  because  the  residual  edge  is  actually  the  reverse  of  an  edge  in  the 
original  network.  Finally,  lines  7-8  update  the  excess  flows  into  vertices  u  and  v. 
Thus,  if  /  is  a  preflow  before  PUSH  is  called,  it  remains  a  preflow  afterward. 

Observe  that  nothing  in  the  code  for  PUSH  depends  on  the  heights  of  u  and  v, 
yet  we  prohibit  it  from  being  invoked  unless  u.h  =  v.h  +  1.  Thus,  we  push  excess 
flow  downhill  only  by  a  height  differential  of  1.  By  Lemma  26.12,  no  residual 
edges  exist  between  two  vertices  whose  heights  differ  by  more  than  1,  and  thus, 
as  long  as  the  attribute  h  is  indeed  a  height  function,  we  would  gain  nothing  by 
allowing  flow  to  be  pushed  downhill  by  a  height  differential  of  more  than  1 . 

We  call  the  operation  Push(u,  v)  a  push  from  u  to  v.  If  a  push  operation  ap¬ 
plies  to  some  edge  (u,  v)  leaving  a  vertex  u,  we  also  say  that  the  push  operation 
applies  to  u.  It  is  a  saturating  push  if  edge  (u,  v)  in  the  residual  network  becomes 
saturated  ( c/(u ,  v)  =  0  afterward);  otherwise,  it  is  a  nonsaturating  push.  If  an 
edge  becomes  saturated,  it  disappears  from  the  residual  network.  A  simple  lemma 
characterizes  one  result  of  a  nonsaturating  push. 

Lemma  26.13 

After  a  nonsaturating  push  from  u  to  v,  the  vertex  u  is  no  longer  overflowing. 

Proof  Since  the  push  was  nonsaturating,  the  amount  of  flow  A fiu.v)  actually 
pushed  must  equal  u.e  prior  to  the  push.  Since  u.e  is  reduced  by  this  amount,  it 
becomes  0  after  the  push.  ■ 
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The  relabel  operation 

The  basic  operation  Relabel (u)  applies  if  u  is  overflowing  and  if  u.h  <  v.h  for 
all  edges  (u,v)  €  Ef.  In  other  words,  we  can  relabel  an  overflowing  vertex  u  if 
for  every  vertex  v  for  which  there  is  residual  capacity  from  u  to  v,  flow  cannot  be 
pushed  from  u  to  v  because  v  is  not  downhill  from  u.  (Recall  that  by  definition, 
neither  the  source  s  nor  the  sink  t  can  be  overflowing,  and  so  s  and  t  are  ineligible 
for  relabeling.) 

Relabel(m) 

1  //  Applies  when:  u  is  overflowing  and  for  all  v  e  V  such  that  (u,v)  €  Ef, 

we  have  u.h  <  v.h. 

2  //  Action:  Increase  the  height  of  u. 

3  u.h  =  1  +  min  {v.h  :  (u,  v)  e  Ef } 

When  we  call  the  operation  Relabel(u),  we  say  that  vertex  u  is  relabeled.  Note 
that  when  u  is  relabeled,  Ef  must  contain  at  least  one  edge  that  leaves  u,  so  that 
the  minimization  in  the  code  is  over  a  nonempty  set.  This  property  follows  from 
the  assumption  that  u  is  overflowing,  which  in  turn  tells  us  that 

u.e  =  /(v> u)  ~  X  v)  >  0  • 

veV  veV 

Since  all  flows  are  nonnegative,  we  must  therefore  have  at  least  one  vertex  v  such 
that  ( v,u).f  >  0.  But  then,  Cf(u  ,v)  >  0,  which  implies  that  (u,  v)  e  Ef.  The 
operation  Relab el(w)  thus  gives  u  the  greatest  height  allowed  by  the  constraints 
on  height  functions. 

The  generic  algorithm 

The  generic  push-relabel  algorithm  uses  the  following  subroutine  to  create  an  ini¬ 
tial  preflow  in  the  flow  network. 

Initialize-Preflow  (G,  5) 

1  for  each  vertex  v  e  G.V 

2  v.h  =  0 

3  v.e  =  0 

4  for  each  edge  ( u ,  v)  e  G.E 

5  (u,  v).f  —  0 

6  s.h  =  |G.V| 

7  for  each  vertex  v  e  s.Adj 

8  (5,  v).f  =  c(s,  v) 

9  v.e  =  c(s,  v) 

10  s.e  =  s.e  —  c(s ,  v) 
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Initialize-Preflow  creates  an  initial  preflow  /  defined  by 


(u,  v).f 


c(u ,  v)  if  u  =  s  , 

0  otherwise  . 


(26.15) 


That  is,  we  fill  to  capacity  each  edge  leaving  the  source  s,  and  all  other  edges  carry 
no  flow.  For  each  vertex  v  adjacent  to  the  source,  we  initially  have  v.e  =  c(s,  v), 
and  we  initialize  s.e  to  the  negative  of  the  sum  of  these  capacities.  The  generic 
algorithm  also  begins  with  an  initial  height  function  h,  given  by 


u.h 


\V\  if  u  =  s  , 

0  otherwise  . 


(26.16) 


Equation  (26.16)  defines  a  height  function  because  the  only  edges  (u.  v)  for  which 
u.h  >  v.h  +  1  are  those  for  which  u  =  s,  and  those  edges  are  saturated,  which 
means  that  they  are  not  in  the  residual  network. 

Initialization,  followed  by  a  sequence  of  push  and  relabel  operations,  executed 
in  no  particular  order,  yields  the  Generic-Push-Relabel  algorithm: 


Generic-Push-Relabel(G) 

1  Initialize-Preflow(G,5) 

2  while  there  exists  an  applicable  push  or  relabel  operation 

3  select  an  applicable  push  or  relabel  operation  and  perform  it 


The  following  lemma  tells  us  that  as  long  as  an  overflowing  vertex  exists,  at  least 
one  of  the  two  basic  operations  applies. 


Lemma  26.14  (An  overflowing  vertex  can  be  either  pushed  or  relabeled) 

Let  G  =  (V,  E)  be  a  flow  network  with  source  s  and  sink  t,  let  /  be  a  preflow, 
and  let  h  be  any  height  function  for  /.  If  u  is  any  overflowing  vertex,  then  either  a 
push  or  relabel  operation  applies  to  it. 

Proof  For  any  residual  edge  (u,v),  we  have  h(u)  <  h(v)  +  1  because  h  is  a 
height  function.  If  a  push  operation  does  not  apply  to  an  overflowing  vertex  u, 
then  for  all  residual  edges  ( u ,  v),  we  must  have  h(u)  <  h(v )  +  1,  which  implies 
h(u )  <  h(v).  Thus,  a  relabel  operation  applies  to  u.  m 


Correctness  of  the  push-relabel  method 

To  show  that  the  generic  push-relabel  algorithm  solves  the  maximum-flow  prob¬ 
lem,  we  shall  first  prove  that  if  it  terminates,  the  preflow  /  is  a  maximum  flow. 
We  shall  later  prove  that  it  terminates.  We  start  with  some  observations  about  the 
height  function  h. 
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Lemma  26.15  ( Vertex  heights  never  decrease) 

During  the  execution  of  the  Generic-Push-Relabel  procedure  on  a  flow  net¬ 
work  G  =  (V,  E),  for  each  vertex  u  €  V,  the  height  u.h  never  decreases.  More¬ 
over,  whenever  a  relabel  operation  is  applied  to  a  vertex  u,  its  height  u.h  increases 
by  at  least  1 . 

Proof  Because  vertex  heights  change  only  during  relabel  operations,  it  suffices 
to  prove  the  second  statement  of  the  lemma.  If  vertex  u  is  about  to  be  rela¬ 
beled,  then  for  all  vertices  v  such  that  ( u ,  v )  e  Ef,  we  have  u.h  <  v.h.  Thus, 

u. h  <  1  +  min{v./7  :  (u,  v)  6  Ef } ,  and  so  the  operation  must  increase  u.h.  m 

Lemma  26.16 

Let  G  =  (V.  E)  be  a  flow  network  with  source  s  and  sink  t.  Then  the  execution  of 
Generic-Push-Relabel  on  G  maintains  the  attribute  h  as  a  height  function. 

Proof  The  proof  is  by  induction  on  the  number  of  basic  operations  performed. 
Initially,  h  is  a  height  function,  as  we  have  already  observed. 

We  claim  that  if  h  is  a  height  function,  then  an  operation  Relabel(k)  leaves  h 
a  height  function.  If  we  look  at  a  residual  edge  (u,v)  e  Ef  that  leaves  u,  then 
the  operation  Relabel(w)  ensures  that  u.h  <  v.h  +  1  afterward.  Now  consider 
a  residual  edge  ( w.u )  that  enters  u.  By  Lemma  26. 15,  w.h  <  u.h  +  1  before  the 
operation  Re lab  el  (w)  implies  w.h  <  u.h  +  1  afterward.  Thus,  the  operation 
Relabel  (w)  leaves  h  a  height  function. 

Now,  consider  an  operation  Push(w,  v).  This  operation  may  add  the  edge  (v,  u) 
to  Ef,  and  it  may  remove  (u.  v)  from  Ef.  In  the  former  case,  we  have 

v. h  =  u.h  —  1  <  u.h  +  1,  and  so  h  remains  a  height  function.  In  the  latter  case, 

removing  (u,  v )  from  the  residual  network  removes  the  corresponding  constraint, 
and  h  again  remains  a  height  function.  ■ 

The  following  lemma  gives  an  important  property  of  height  functions. 

Lemma  26.1 7 

Let  G  =  (V,  E)  be  a  flow  network  with  source  s  and  sink  t,  let  /  be  a  preflow 
in  G,  and  let  h  be  a  height  function  on  V .  Then  there  is  no  path  from  the  source  s 
to  the  sink  t  in  the  residual  network  G/. 

Proof  Assume  for  the  sake  of  contradiction  that  G/  contains  a  path  p  from  .v  to  l , 
where  p  =  (v0>  Vi,  . . . ,  Vk),  Vo  =  s,  and  v k  =  t.  Without  loss  of  generality,  p 
is  a  simple  path,  and  so  k  <  |F|.  For  i  =  0, 1, . . . ,  k  —  1,  edge  (ly ,  v,+1)  €  Ef. 
Because  h  is  a  height  function,  h (iy )  <  li(vi+\)  +  1  for  i  =  0, 1, . . . ,  k  —  1.  Com¬ 
bining  these  inequalities  over  path  p  yields  h(s)  <  h{t)+k.  But  because  li(t)  =  0, 
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we  have  h (.s' )  <  k  <  \V\,  which  contradicts  the  requirement  that  h(s)  =  \  V\  in  a 
height  function.  ■ 

We  are  now  ready  to  show  that  if  the  generic  push-relabel  algorithm  terminates, 
the  preflow  it  computes  is  a  maximum  flow. 

Theorem  26.18  (Correctness  of  the  generic  push-relabel  algorithm ) 

If  the  algorithm  Generic-Push-Relabel  terminates  when  run  on  a  flow  net¬ 
work  G  =  (V,  E)  with  source  s  and  sink  t,  then  the  preflow  /  it  computes  is  a 
maximum  flow  for  G. 

Proof  We  use  the  following  loop  invariant: 

Each  time  the  while  loop  test  in  line  2  in  Generic-Push-Relabel  is 
executed,  /  is  a  preflow. 

Initialization:  Initialize- Preflow  makes  /  a  preflow. 

Maintenance:  The  only  operations  within  the  while  loop  of  lines  2-3  are  push  and 
relabel.  Relabel  operations  affect  only  height  attributes  and  not  the  flow  values; 
hence  they  do  not  affect  whether  /  is  a  preflow.  As  argued  on  page  739,  if  /  is 
a  preflow  prior  to  a  push  operation,  it  remains  a  preflow  afterward. 

Termination:  At  termination,  each  vertex  in  V  —  {5,  t}  must  have  an  excess  of  0, 
because  by  Lemma  26. 14  and  the  invariant  that  /  is  always  a  preflow,  there  are 
no  overflowing  vertices.  Therefore,  /  is  a  flow.  Lemma  26.16  shows  that  h  is 
a  height  function  at  termination,  and  thus  Lemma  26.17  tells  us  that  there  is  no 
path  from  s  to  t  in  the  residual  network  G/.  By  the  max-flow  min-cut  theorem 
(Theorem  26.6),  therefore,  /  is  a  maximum  flow.  ■ 

Analysis  of  the  push-relabel  method 

To  show  that  the  generic  push-relabel  algorithm  indeed  terminates,  we  shall  bound 
the  number  of  operations  it  performs.  We  bound  separately  each  of  the  three  types 
of  operations:  relabels,  saturating  pushes,  and  nonsaturating  pushes.  With  knowl¬ 
edge  of  these  bounds,  it  is  a  straightforward  problem  to  construct  an  algorithm  that 
runs  in  0(V2E)  time.  Before  beginning  the  analysis,  however,  we  prove  an  im¬ 
portant  lemma.  Recall  that  we  allow  edges  into  the  source  in  the  residual  network. 

Lemma  26.19 

Let  G  =  (V,  E)  be  a  flow  network  with  source  s  and  sink  t,  and  let  /  be  a  preflow 
in  G.  Then,  for  any  overflowing  vertex  x,  there  is  a  simple  path  from  x  to  s  in  the 
residual  network  G/. 
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Proof  For  an  overflowing  vertex  x,  let  U  =  {v  :  there  exists  a  simple  path  from  x 
to  v  in  G/},  and  suppose  for  the  sake  of  contradiction  that  s  f  U .  Let  U  =  V  —  U. 

We  take  the  definition  of  excess  from  equation  (26.14),  sum  over  all  vertices 
in  U,  and  note  that  V  =  U  U  U,  to  obtain 


^2e(u) 


ueU 


Y  fe/(v* 

ueU  VveK  veV  ) 

E  E  f(y,u)  +  Y,  /(v> u)  J  —  |  Y,  /(M>  v)  +  Yj  v)  |  | 

ueU  \  \veU  VFU  )  \veU  VCU  /  / 

Y  Y  w)  +  Y  Y  u)~YY  -  Y  Y  v) 


ueU  velt 


ueU  vsU 


ueU  veU 


usU  vsU 


YYf^’^-YYf^’^  ■ 

ueU  veU  ueU  v€U 


We  know  that  the  quantity  e(u)  must  be  positive  because  e(x)  >  0,  x  e  U, 
all  vertices  other  than  s  have  nonnegative  excess,  and,  by  assumption,  s  f  U .  Thus, 
we  have 

EE  f(v,u)~  EE  f(u,v)>  0.  (26.17) 

ueU  veU  ueU  veU 

All  edge  flows  are  nonnegative,  and  so  for  equation  (26.17)  to  hold,  we  must  have 
f(v> w)  >  0.  Hence,  there  must  exist  at  least  one  pah-  of  vertices 
u’  e  U  and  v’  e  U  with  >  0.  But,  if  f(v',u')  >  0,  there  must  be  a 

residual  edge  ( u ',  v'),  which  means  that  there  is  a  simple  path  from  x  to  v'  (the 
path  x  u'  — >■  v'),  thus  contradicting  the  definition  of  U .  m 


The  next  lemma  bounds  the  heights  of  vertices,  and  its  corollary  bounds  the 
number  of  relabel  operations  that  are  performed  in  total. 


Lemma  26.20 

Let  G  =  (V,  E)  be  a  flow  network  with  source  s  and  sink  t.  At  any  time  during 
the  execution  of  Generic-Push-Relabel  on  G,  we  have  u.h  <  2  \  V\  —  1  for  all 
vertices  u  €  V. 


Proof  The  heights  of  the  source  s  and  the  sink  1  never  change  because  these 
vertices  are  by  definition  not  overflowing.  Thus,  we  always  have  s.h  =  \V\  and 
t.h  =  0,  both  of  which  are  no  greater  than  2  |  V\  —  1. 

Now  consider  any  vertex  u  €  V  —  {.s\  l).  Initially,  u.h  =  0  <  2  |  L |  —  1 .  We  shall 
show  that  after  each  relabeling  operation,  we  still  have  u .  h  <  2  |  V  \  —  1 .  When  u  is 
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relabeled,  it  is  overflowing,  and  Lemma  26.19  tells  us  that  there  is  a  simple  path  p 
from  u  to  s  in  G/.  Let  p  =  (y0,  Vi, . . . ,  vf),  where  y0  =  u,  Vg  =  s,  and  k  <  \  V\  —  1 
because  p  is  simple.  For  i  =  0,  \,...,k  —  1,  we  have  (y,-,uI+1)  e  Ef ,  and 
therefore,  by  Lemma  26.16,  Vj.h  <  vi+\.h  +  1.  Expanding  these  inequalities  over 
path  p  yields  u.h  =  v0.h  <  vg.h  +  k  <  s.h  +  (|F|  —  1)  =  2  \V\  —  1.  ■ 


Corollary  26.21  (Bound  on  relabel  operations ) 

Let  G  =  (V,  E)  be  a  flow  network  with  source  s  and  sink  t.  Then,  during  the 
execution  of  Generic-Push-Relabel  on  G,  the  number  of  relabel  operations  is 
at  most  2  \V\  —  1  per  vertex  and  at  most  (2  \V\  —  1)(|F|  —  2)  <  2  \V\2  overall. 

Proof  Only  the  |  V\  —  2  vertices  in  V  —  {5,  t)  may  be  relabeled.  Letn  e  V  —  {s,f}. 
The  operation  Relabel(u)  increases  u.h.  The  value  of  u.h  is  initially  0  and  by 
Lemma  26.20,  it  grows  to  at  most  2  \  V\  —  1.  Thus,  each  vertex  u  €  V  —  {s.t} 
is  relabeled  at  most  2  |  V  \  —  1  times,  and  the  total  number  of  relabel  operations 
performed  is  at  most  (2  \  V\  —  1)(|F|  —  2)  <  2  \  V\2.  m 

Lemma  26.20  also  helps  us  to  bound  the  number  of  saturating  pushes. 

Lemma  26.22  (Bound  on  saturating  pushes) 

During  the  execution  of  Generic-Push-Relabel  on  any  flow  network  G  = 
(V,  E),  the  number  of  saturating  pushes  is  less  than  2  |  V\  \E\. 

Proof  For  any  pair  of  vertices  u,  v  €  V,  we  will  count  the  saturating  pushes 
from  u  to  v  and  from  v  to  u  together,  calling  them  the  saturating  pushes  between  u 
and  y.  If  there  are  any  such  pushes,  at  least  one  of  (u,  v )  and  (u,  u)  is  actually 
an  edge  in  E.  Now,  suppose  that  a  saturating  push  from  u  to  v  has  occurred. 
At  that  time,  v.h  =  u.h  —  1.  In  order  for  another  push  from  u  to  v  to  occur 
later,  the  algorithm  must  first  push  flow  from  y  to  w,  which  cannot  happen  until 
v.h  =  u.h  +  1.  Since  u.h  never  decreases,  in  order  for  v.h  =  u.h  +  1,  the 
value  of  v.h  must  increase  by  at  least  2.  Likewise,  u.h  must  increase  by  at  least  2 
between  saturating  pushes  from  y  to  u.  Heights  staid  at  0  and,  by  Lemma  26.20, 
never  exceed  2  |  V  \  —  1,  which  implies  that  the  number  of  times  any  vertex  can  have 
its  height  increase  by  2  is  less  than  \  V\.  Since  at  least  one  of  u.h  and  v.h  must 
increase  by  2  between  any  two  saturating  pushes  between  u  and  y,  there  are  fewer 
than  2  \  V\  saturating  pushes  between  u  and  y.  Multiplying  by  the  number  of  edges 
gives  a  bound  of  less  than  2  |  V \  \E\  on  the  total  number  of  saturating  pushes.  ■ 

The  following  lemma  bounds  the  number  of  nonsaturating  pushes  in  the  generic 
push-relabel  algorithm. 
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Lemma  26.23  ( Bound  on  nonsaturating  pushes ) 

During  the  execution  of  Generic-Push-Relabel  on  any  flow  network  G  = 
(V,  E),  the  number  of  nonsaturating  pushes  is  less  than  4  |  V\2  (|  V\  +  |  /:  |). 

Proof  Define  a  potential  function  $  =  5Zve(v)>o  Initially,  <£>  =  0,  and  the 
value  of  <I>  may  change  after  each  relabeling,  saturating  push,  and  nonsaturating 
push.  We  will  bound  the  amount  that  saturating  pushes  and  relabelings  can  con¬ 
tribute  to  the  increase  of  O.  Then  we  will  show  that  each  nonsaturating  push  must 
decrease  by  at  least  1,  and  will  use  these  bounds  to  derive  an  upper  bound  on  the 
number  of  nonsaturating  pushes. 

Let  us  examine  the  two  ways  in  which  d>  might  increase.  First,  relabeling  a 
vertex  u  increases  T>  by  less  than  2\V\,  since  the  set  over  which  the  sum  is  taken  is 
the  same  and  the  relabeling  cannot  increase  n ’s  height  by  more  than  its  maximum 
possible  height,  which,  by  Lemma  26.20,  is  at  most  2  |  V\  —  1.  Second,  a  saturating 
push  from  a  vertex  u  to  a  vertex  v  increases  d>  by  less  than  2  |  V  | ,  since  no  heights 
change  and  only  vertex  v,  whose  height  is  at  most  2  \V\  —  1,  can  possibly  become 
overflowing. 

Now  we  show  that  a  nonsaturating  push  from  u  to  v  decreases  <J>  by  at  least  1 . 
Why?  Before  the  nonsaturating  push,  u  was  overflowing,  and  v  may  or  may  not 
have  been  overflowing.  By  Lemma  26.13,  u  is  no  longer  overflowing  after  the 
push.  In  addition,  unless  v  is  the  source,  it  may  or  may  not  be  overflowing  after 
the  push.  Therefore,  the  potential  function  <I>  has  decreased  by  exactly  u.h,  and  it 
has  increased  by  either  0  or  v.h.  Since  u.h  —  v.h  =  1,  the  net  effect  is  that  the 
potential  function  has  decreased  by  at  least  1 . 

Thus,  during  the  course  of  the  algorithm,  the  total  amount  of  increase  in  <£>  is 
due  to  relabelings  and  saturated  pushes,  and  Corollary  26.21  and  Lemma  26.22 
constrain  the  increase  to  be  less  than  (2  |F|)(2 \V\2)  +  (2  |F|)(2 \V\  |is|)  = 
4  \V\2  ( |  L |  +  \E\).  Since  <J>  >  0,  the  total  amount  of  decrease,  and  therefore  the 
total  number  of  nonsaturating  pushes,  is  less  than  4  \V\2  (\V\  +  |£'|).  ■ 

Having  bounded  the  number  of  relabelings,  saturating  pushes,  and  nonsatu¬ 
rating  push,  we  have  set  the  stage  for  the  following  analysis  of  the  Generic  - 
Push-Relabel  procedure,  and  hence  of  any  algorithm  based  on  the  push-relabel 
method. 

Theorem  26.24 

During  the  execution  of  Generic-Push-Relabel  on  any  flow  network  G  = 
(V.  E).  the  number  of  basic  operations  is  0(V2  E). 

Proof  Immediate  from  Corollary  26.21  and  Lemmas  26.22  and  26.23.  ■ 
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Thus,  the  algorithm  terminates  after  0(V2E)  operations.  All  that  remains  is 
to  give  an  efficient  method  for  implementing  each  operation  and  for  choosing  an 
appropriate  operation  to  execute. 

Corollary  26.25 

There  is  an  implementation  of  the  generic  push-relabel  algorithm  that  runs  in 
0(V2E)  time  on  any  flow  network  G  =  (V,  E). 

Proof  Exercise  26.4-2  asks  you  to  show  how  to  implement  the  generic  algorithm 
with  an  overhead  of  0{V)  per  relabel  operation  and  0(1)  per  push.  It  also  asks 
you  to  design  a  data  structure  that  allows  you  to  pick  an  applicable  operation  in 
0(1)  time.  The  corollary  then  follows.  ■ 

Exercises 


26.4-1 

Prove  that,  after  the  procedure  Initialize-Preflow(G,  5)  terminates,  we  have 
s.e  <  —  |/*|,  where  f*  is  a  maximum  flow  for  G. 


26.4-2 

Show  how  to  implement  the  generic  push-relabel  algorithm  using  0(V)  time  per 
relabel  operation,  0(1)  time  per  push,  and  0(1)  time  to  select  an  applicable  oper¬ 
ation,  for  a  total  time  of  0(V2E). 


26.4-3 

Prove  that  the  generic  push-relabel  algorithm  spends  a  total  of  only  0(VE)  time 
in  performing  all  the  0(V2)  relabel  operations. 


26.4-4 

Suppose  that  we  have  found  a  maximum  flow  in  a  flow  network  G  =  ( V ,  E)  using 
a  push-relabel  algorithm.  Give  a  fast  algorithm  to  find  a  minimum  cut  in  G. 


26.4-5 

Give  an  efficient  push-relabel  algorithm  to  find  a  maximum  matching  in  a  bipartite 
graph.  Analyze  your  algorithm. 


26.4-6 

Suppose  that  all  edge  capacities  in  a  flow  network  G  =  (V,  E)  are  in  the  set 

{1,2 . k}.  Analyze  the  running  time  of  the  generic  push-relabel  algorithm  in 

terms  of  |  V\,  \E\,  and  k.  {Hint:  How  many  times  can  each  edge  support  a  nonsat¬ 
urating  push  before  it  becomes  saturated?) 
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26.4-7 

Show  that  we  could  change  line  6  of  Initialize-Preflow  to 
6  s.h  =  \G.V\-2 

without  affecting  the  correctness  or  asymptotic  performance  of  the  generic  push- 
relabel  algorithm. 


26.4- 8 

Let  <5/(w,  v)  be  the  distance  (number  of  edges)  from  u  to  v  in  the  residual  net¬ 
work  Gf.  Show  that  the  Generic-Push-Relabel  procedure  maintains  the 
properties  that  u.h  <  \V\  implies  u.h  <  8f(u,t)  and  that  u.h  >  \V\  implies 
u.h  —  \V\  <  8f(u,s). 

26.4- 9  * 

As  in  the  previous  exercise,  let  8/(u.v)  be  the  distance  from  u  to  v  in  the  residual 
network  Gf.  Show  how  to  modify  the  generic  push-relabel  algorithm  to  maintain 
the  property  that  u.h  <  \V\  implies  u.h  =  8/(u,t )  and  that  u.h  >  \V\  implies 
u.h  —  |  V\  —  8f(u,s).  The  total  time  that  your  implementation  dedicates  to  main¬ 
taining  this  property  should  be  O(VE). 

26.4- 10 

Show  that  the  number  of  nonsaturating  pushes  executed  by  the  Generic-Push- 
Relabel  procedure  on  a  flow  network  G  =  (V,E)  is  at  most  4|F|2  \E\  for 

m  >4. 


★  26.5  The  relabel-to-front  algorithm 

The  push-relabel  method  allows  us  to  apply  the  basic  operations  in  any  order  at 
all.  By  choosing  the  order  carefully  and  managing  the  network  data  structure  effi¬ 
ciently,  however,  we  can  solve  the  maximum-flow  problem  faster  than  the  0{V2E) 
bound  given  by  Corollary  26.25.  We  shall  now  examine  the  relabel-to-front  algo¬ 
rithm,  a  push-relabel  algorithm  whose  running  time  is  0(V3),  which  is  asymptot¬ 
ically  at  least  as  good  as  0(V2E),  and  even  better  for  dense  networks. 

The  relabel-to-front  algorithm  maintains  a  list  of  the  vertices  in  the  network. 
Beginning  at  the  front,  the  algorithm  scans  the  list,  repeatedly  selecting  an  over¬ 
flowing  vertex  u  and  then  “discharging”  it,  that  is,  performing  push  and  relabel 
operations  until  u  no  longer  has  a  positive  excess.  Whenever  we  relabel  a  ver¬ 
tex,  we  move  it  to  the  front  of  the  list  (hence  the  name  “relabel-to-front”)  and  the 
algorithm  begins  its  scan  anew. 


26.5  The  relabel  to  front  algorithm 


749 


The  correctness  and  analysis  of  the  relabel-to-front  algorithm  depend  on  the 
notion  of  “admissible”  edges:  those  edges  in  the  residual  network  through  which 
flow  can  be  pushed.  After  proving  some  properties  about  the  network  of  admissible 
edges,  we  shall  investigate  the  discharge  operation  and  then  present  and  analyze  the 
relabel-to-front  algorithm  itself. 

Admissible  edges  and  networks 

If  G  =  (V.  E)  is  a  How  network  with  source  s  and  sink  t,  f  is  a  preflow  in  G ,  and  h 
is  a  height  function,  then  we  say  that  (u.  v)  is  an  admissible  edge  if  Cj  (it  ,v)  >  0 
and  h(u)  =  h(v)  +  I .  Otherwise,  (u.  v)  is  inadmissible.  The  admissible  network 
is  G/h  =  (V,  Efh),  where  Ef is  the  set  of  admissible  edges. 

The  admissible  network  consists  of  those  edges  through  which  we  can  push  flow. 
The  following  lemma  shows  that  this  network  is  a  directed  acyclic  graph  (dag). 

Lemma  26.26  (The  admissible  network  is  acyclic ) 

If  G  =  (F,  E)  is  a  flow  network,  /  is  a  preflow  in  G,  and  h  is  a  height  function 
on  G,  then  the  admissible  network  Gf,h  =  (V,  Efh)  is  acyclic. 

Proof  The  proof  is  by  contradiction.  Suppose  that  G/j ,  contains  a  cycle  p  = 
(y0,  Vi, . . . ,  Vk),  where  v0  =  v g  and  k  >  0.  Since  each  edge  in  p  is  admissible,  we 
have  /j(v,_i)  =  /z(v;)  +  1  for  i  =  1,2 , ,k.  Summing  around  the  cycle  gives 

k  k 

^/z(v,-_!)  =  ^(/7(V,;)+1) 

i= 1  i=l 

k 

=  Y:  h(v,  )  +  k  . 

Because  each  vertex  in  cycle  p  appears  once  in  each  of  the  summations,  we  derive 
the  contradiction  that  0  =  k.  m 

The  next  two  lemmas  show  how  push  and  relabel  operations  change  the  admis¬ 
sible  network. 

Lemma  26.27 

Let  G  =  (V.  E)  be  a  flow  network,  let  /  be  a  preflow  in  G,  and  suppose  that  the 
attribute  h  is  a  height  function.  If  a  vertex  u  is  overflowing  and  (it.  v)  is  an  ad¬ 
missible  edge,  then  Push(w,  v)  applies.  The  operation  does  not  create  any  new 
admissible  edges,  but  it  may  cause  ( u ,  v)  to  become  inadmissible. 
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Proof  By  the  definition  of  an  admissible  edge,  we  can  push  flow  from  u  to  v. 
Since  u  is  overflowing,  the  operation  PuSH(n,  v)  applies.  The  only  new  residual 
edge  that  pushing  flow  from  u  to  v  can  create  is  (u,  u).  Since  v.h  =  u.h  —  1, 
edge  (v,  u)  cannot  become  admissible.  If  the  operation  is  a  saturating  push,  then 
C/(u ,  v)  =  0  afterward  and  (u,  v)  becomes  inadmissible.  ■ 

Lemma  26.28 

Let  G  =  (V,  E)  be  a  flow  network,  let  /  be  a  preflow  in  G,  and  suppose  that 
the  attribute  li  is  a  height  function.  If  a  vertex  it  is  overflowing  and  there  are  no 
admissible  edges  leaving  u,  then  Relabel (m)  applies.  After  the  relabel  operation, 
there  is  at  least  one  admissible  edge  leaving  u,  but  there  are  no  admissible  edges 
entering  u. 

Proof  If  u  is  overflowing,  then  by  Lemma  26.14,  either  a  push  or  a  relabel  op¬ 
eration  applies  to  it.  If  there  are  no  admissible  edges  leaving  u,  then  no  flow 
can  be  pushed  from  u  and  so  Relabel  (w)  applies.  After  the  relabel  operation, 
u.h  =  1  +  min{v./7  :  (u,  v)  e  Ej } .  Thus,  if  v  is  a  vertex  that  realizes  the  mini¬ 
mum  in  this  set,  the  edge  (u,  v)  becomes  admissible.  Hence,  after  the  relabel,  there 
is  at  least  one  admissible  edge  leaving  u. 

To  show  that  no  admissible  edges  enter  u  after  a  relabel  operation,  suppose  that 
there  is  a  vertex  v  such  that  (v,  u)  is  admissible.  Then,  v.h  =  u.h  +  1  after  the 
relabel,  and  so  v.h  >  u.h  +  1  just  before  the  relabel.  But  by  Lemma  26.12,  no 
residual  edges  exist  between  vertices  whose  heights  differ  by  more  than  1 .  More¬ 
over,  relabeling  a  vertex  does  not  change  the  residual  network.  Thus,  (v,  u)  is  not 
in  the  residual  network,  and  hence  it  cannot  be  in  the  admissible  network.  ■ 

Neighbor  lists 

Edges  in  the  relabel-to-front  algorithm  are  organized  into  “neighbor  lists.”  Given 
a  flow  network  G  =  (V,  E),  the  neighbor  list  u.N  for  a  vertex  u  e  K  is  a  singly 
linked  list  of  the  neighbors  of  u  in  G.  Thus,  vertex  v  appears  in  the  list  u.N  if 
C u,  v)  €  E  or  (v,  u)  e  E.  The  neighbor  list  u.N  contains  exactly  those  vertices  v 
for  which  there  may  be  a  residual  edge  (u,  v).  The  attribute  u.N. head  points  to 
the  first  vertex  in  u.N,  and  v .next-neighbor  points  to  the  vertex  following  v  in  a 
neighbor  list;  this  pointer  is  NIL  if  v  is  the  last  vertex  in  the  neighbor  list. 

The  relabel-to-front  algorithm  cycles  through  each  neighbor  list  in  an  arbitrary 
order  that  is  fixed  throughout  the  execution  of  the  algorithm.  For  each  vertex  u, 
the  attribute  u. current  points  to  the  vertex  currently  under  consideration  in  u.N. 
Initially,  u. current  is  set  to  u.N. head. 
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Discharging  an  overflowing  vertex 

An  overflowing  vertex  u  is  discharged  by  pushing  all  of  its  excess  flow  through 
admissible  edges  to  neighboring  vertices,  relabeling  u  as  necessary  to  cause  edges 
leaving  u  to  become  admissible.  The  pseudocode  goes  as  follows. 

Discharge(w) 

1  while  u.e  >  0 

2  v  =  u.  current 

3  if  v  ==  NIL 

4  Relab  el  (u) 

5  u.  current  =  u.N.  head 

6  elseif  c/(u,  v)  >  0  and  u.h  ==  v.h  +  1 

7  Push(m,v) 

8  else  u. current  =  v .next -neighbor 

Figure  26.9  steps  through  several  iterations  of  the  while  loop  of  lines  1-8,  which 
executes  as  long  as  vertex  u  has  positive  excess.  Each  iteration  performs  exactly 
one  of  three  actions,  depending  on  the  current  vertex  v  in  the  neighbor  list  u.N. 

1.  If  v  is  NIL,  then  we  have  run  off  the  end  of  u.N.  Line  4  relabels  vertex  u, 
and  then  line  5  resets  the  current  neighbor  of  u  to  be  the  first  one  in  u.N. 
(Lemma  26.29  below  states  that  the  relabel  operation  applies  in  this  situation.) 

2.  If  v  is  non-NlL  and  ( u ,  v)  is  an  admissible  edge  (determined  by  the  test  in 
line  6),  then  line  7  pushes  some  (or  possibly  all)  of  w’s  excess  to  vertex  v. 

3.  If  v  is  non-NlL  but  ( u ,  v)  is  inadmissible,  then  line  8  advances  u. current  one 
position  further  in  the  neighbor  list  u.N. 

Observe  that  if  Discharge  is  called  on  an  overflowing  vertex  u,  then  the  last 
action  performed  by  DISCHARGE  must  be  a  push  from  u.  Why?  The  procedure 
terminates  only  when  u .  e  becomes  zero,  and  neither  the  relabel  operation  nor  ad¬ 
vancing  the  pointer  u. current  affects  the  value  of  u.e. 

We  must  be  sure  that  when  Push  or  Relabel  is  called  by  Discharge,  the 
operation  applies.  The  next  lemma  proves  this  fact. 

Lemma  26.29 

If  Discharge  calls  Push(m,  v)  in  line  7,  then  a  push  operation  applies  to  (w,  v). 
If  Discharge  calls  Relabel(w)  in  line  4,  then  a  relabel  operation  applies  to  u. 

Proof  The  tests  in  lines  1  and  6  ensure  that  a  push  operation  occurs  only  if  the 
operation  applies,  which  proves  the  first  statement  in  the  lemma. 
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Figure  26.9  Discharging  a  vertex  y.  It  takes  1 5  iterations  of  the  while  loop  of  DISCHARGE  to  push 
all  the  excess  flow  from  y.  Only  the  neighbors  of  y  and  edges  of  the  fl  ow  network  that  enter  or  leave  y 
are  shown.  In  each  part  of  the  figure,  the  number  inside  each  vertex  is  its  excess  at  the  beginning  of 
the  first  iteration  shown  in  the  part,  and  each  vertex  is  shown  at  its  height  throughout  the  part.  The 
neighbor  list  y.N  at  the  beginning  of  each  iteration  appears  on  the  right,  with  the  iteration  number 
on  top.  The  shaded  neighbor  is  y. current,  (a)  Initially,  there  are  19  units  of  excess  to  push  from  y, 
and  y. current  =  s.  Iterations  1,  2,  and  3  just  advance  y. current,  since  there  are  no  admissible  edges 
leaving  y.  In  iteration  4,  y.  current  =  NIL  (shown  by  the  shading  being  below  the  neighbor  list), 
and  so  y  is  relabeled  and  y.  current  is  reset  to  the  head  of  the  neighbor  list,  (b)  After  relabeling, 
vertex  y  has  height  1.  In  iterations  5  and  6,  edges  (y,  s)  and  (y.x)  are  found  to  be  inadmissible,  but 
iteration  7  pushes  8  units  of  excess  flow  from  y  to  z.  Because  of  the  push,  y .  current  does  not  advance 
in  this  iteration,  (c)  Because  the  push  in  iteration  7  saturated  edge  (y ,  z),  it  is  found  inadmissible  in 
iteration  8.  In  iteration  9,  y. current  =  NIL,  and  so  vertex  y  is  again  relabeled  and  y. current  is  reset. 
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Figure  26.9,  continued  (d)  In  iteration  10,  (_y,s)  is  inadmissible,  but  iteration  11  pushes  5  units 
of  excess  flow  from  y  to  x.  (e)  Because  y. current  did  not  advance  in  iteration  11,  iteration  12 
finds  ( y,x )  to  be  inadmissible.  Iteration  13  finds  (>>,z)  inadmissible,  and  iteration  14  relabels  ver 
tex  v  and  resets  y.  current,  (ff)  Iteration  15  pushes  6  units  of  excess  flow  from  y  to  s.  (g)  Vertex  y 
now  has  no  excess  flow,  and  DISCHARGE  terminates.  In  this  example,  DISCHARGE  both  starts  and 
finishes  with  the  current  pointer  at  the  head  of  the  neighbor  list,  but  in  general  this  need  not  be  the 


case. 
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To  prove  the  second  statement,  according  to  the  test  in  line  1  and  Lemma  26.28, 
we  need  only  show  that  all  edges  leaving  u  are  inadmissible.  If  a  call  to 
Discharge (n)  stalls  with  the  pointer  u. current  at  the  head  of  u's  neighbor  list 
and  finishes  with  it  off  the  end  of  the  list,  then  all  of  u’s  outgoing  edges  are  in¬ 
admissible  and  a  relabel  operation  applies.  It  is  possible,  however,  that  during  a 
call  to  Discharge (m),  the  pointer  u. current  traverses  only  part  of  the  list  be¬ 
fore  the  procedure  returns.  Calls  to  DISCHARGE  on  other  vertices  may  then  oc¬ 
cur,  but  u.  current  will  continue  moving  through  the  list  during  the  next  call  to 
Discharge (u).  We  now  consider  what  happens  during  a  complete  pass  through 
the  list,  which  begins  at  the  head  of  u.N  and  finishes  with  u. current  =  NIL.  Once 
u.  current  reaches  the  end  of  the  list,  the  procedure  relabels  u  and  begins  a  new 
pass.  For  the  u.  current  pointer  to  advance  past  a  vertex  v  g  u.N  during  a  pass,  the 
edge  (u,  v)  must  be  deemed  inadmissible  by  the  test  in  line  6.  Thus,  by  the  time 
the  pass  completes,  every  edge  leaving  u  has  been  determined  to  be  inadmissible 
at  some  time  during  the  pass.  The  key  observation  is  that  at  the  end  of  the  pass, 
every  edge  leaving  u  is  still  inadmissible.  Why?  By  Lemma  26.21,  pushes  cannot 
create  any  admissible  edges,  regardless  of  which  vertex  the  flow  is  pushed  from. 
Thus,  any  admissible  edge  must  be  created  by  a  relabel  operation.  But  the  vertex  u 
is  not  relabeled  during  the  pass,  and  by  Lemma  26.28,  any  other  vertex  v  that  is 
relabeled  during  the  pass  (resulting  from  a  call  of  DISCHARGE (v))  has  no  entering 
admissible  edges  after  relabeling.  Thus,  at  the  end  of  the  pass,  all  edges  leaving  u 
remain  inadmissible,  which  completes  the  proof.  ■ 

The  relabel-to-front  algorithm 

In  the  relabel-to-front  algorithm,  we  maintain  a  linked  list  L  consisting  of  all  ver¬ 
tices  in  V  —  {i,  t}.  A  key  property  is  that  the  vertices  in  L  are  topologically  sorted 
according  to  the  admissible  network,  as  we  shall  see  in  the  loop  invariant  that  fol¬ 
lows.  (Recall  from  Lemma  26.26  that  the  admissible  network  is  a  dag.) 

The  pseudocode  for  the  relabel-to-front  algorithm  assumes  that  the  neighbor 
lists  u.N  have  already  been  created  for  each  vertex  u.  It  also  assumes  that  u.next 
points  to  the  vertex  that  follows  u  in  list  L  and  that,  as  usual,  u .  next  =  NIL  if  u  is 
the  last  vertex  in  the  list. 
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Relab el-To-Front(G,  s,  t) 

1  Initi  alize- Preflow  (G,  s ) 

2  L  =  G.  V  —  {5,  t},  in  any  order 

3  for  each  vertex  u  €  G.V  —  {s,  t} 

4  u.  current  —  u.N.head 

5  u  —  L.head 

6  while  u  ^  nil 

7  old-height  =  u.h 

8  Discharge(m) 

9  if  u.  h  >  old-height 

10  move  u  to  the  front  of  list  L 

11  u  =  u.next 

The  relabel-to-front  algorithm  works  as  follows.  Line  1  initializes  the  preflow 
and  heights  to  the  same  values  as  in  the  generic  push-relabel  algorithm.  Line  2 
initializes  the  list  L  to  contain  all  potentially  overflowing  vertices,  in  any  order. 
Lines  3-4  initialize  the  current  pointer  of  each  vertex  u  to  the  first  vertex  in  w’s 
neighbor  list. 

As  Figure  26.10  illustrates,  the  while  loop  of  lines  6-11  runs  through  the  list  L, 
discharging  vertices.  Line  5  makes  it  staid  with  the  first  vertex  in  the  list.  Each 
time  through  the  loop,  line  8  discharges  a  vertex  u.  If  it  was  relabeled  by  the 
Discharge  procedure,  line  10  moves  it  to  the  front  of  list  L.  We  can  determine 
whether  u  was  relabeled  by  comparing  its  height  before  the  discharge  operation, 
saved  into  the  variable  old-height  in  line  7,  with  its  height  afterward,  in  line  9. 
Line  1 1  makes  the  next  iteration  of  the  while  loop  use  the  vertex  following  u  in 
list  L.  If  line  10  moved  u  to  the  front  of  the  list,  the  vertex  used  in  the  next  iteration 
is  the  one  following  u  in  its  new  position  in  the  list. 

To  show  that  Relabel-To-Front  computes  a  maximum  flow,  we  shall  show 
that  it  is  an  implementation  of  the  generic  push-relabel  algorithm.  First,  ob¬ 
serve  that  it  performs  push  and  relabel  operations  only  when  they  apply,  since 
Lemma  26.29  guarantees  that  Discharge  performs  them  only  when  they  apply. 
It  remains  to  show  that  when  Relabel-To-Front  terminates,  no  basic  opera¬ 
tions  apply.  The  remainder  of  the  correctness  argument  relies  on  the  following 
loop  invariant: 

At  each  test  in  line  6  of  Relabel-To-Front,  list  L  is  a  topological  sort 
of  the  vertices  in  the  admissible  network  G/^  =  (V,  Ef *),  and  no  vertex 
before  u  in  the  list  has  excess  flow. 

Initialization:  Immediately  after  Initialize-Preflow  has  been  run,  s.h  =  \  V\ 
and  v.h  =  0  for  all  v  e  V  —  {s}.  Since  \V\  >  2  (because  V  contains  at 
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Figure  26.10  The  action  of  RELABEL  To  FRONT,  (a)  A  flow  network  just  before  the  first  iteration 
of  the  while  loop.  Initially,  26  units  of  flow  leave  source  s.  On  the  right  is  shown  the  initial  list 
L  =  (x,  y,z),  where  initially  u  =  x.  Under  each  vertex  in  list  L  is  its  neighbor  list,  with  the  current 
neighbor  shaded.  Vertex  x  is  discharged.  It  is  relabeled  to  height  1,  5  units  of  excess  flow  are  pushed 
to  y,  and  the  7  remaining  units  of  excess  are  pushed  to  the  sink  t.  Because  x  is  relabeled,  it  moves 
to  the  head  of  L,  which  in  this  case  does  not  change  the  structure  of  L.  (b)  After  x,  the  next  vertex 
in  L  that  is  discharged  is  y.  Figure  26.9  shows  the  detailed  action  of  discharging  y  in  this  situation. 
Because  y  is  relabeled,  it  is  moved  to  the  head  of  L.  (c)  Vertex  x  now  follows  y  in  L,  and  so  it  is 
again  discharged,  pushing  all  5  units  of  excess  flow  to  t.  Because  vertex  x  is  not  relabeled  in  this 
discharge  operation,  it  remains  in  place  in  list  L. 
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Figure  26.10,  continued  (d)  Sinoe  vertex  z  follows  vertex  x  in  L,  it  is  discharged.  It  is  relabeled 
to  height  1  and  all  8  units  of  excess  flow  are  pushed  to  t.  Because  z  is  relabeled,  it  moves  to  the 
front  of  L.  (e)  Vertex  y  now  follows  vertex  z  in  L  and  is  therefore  discharged.  But  because  y  has  no 
excess,  DISCHARGE  immediately  returns,  and  y  remains  in  place  in  L.  Vertex  x  is  then  discharged. 
Because  it,  too,  has  no  excess.  Discharge  again  returns,  and  x  remains  in  place  in  L.  Relabel 
TO  Front  has  reached  the  end  of  list  L  and  terminates.  There  are  no  overflowing  vertices,  and  the 
preflow  is  a  maximum  flow. 


least  s  and  t),  no  edge  can  be  admissible.  Thus,  E/h  —  0,  and  any  ordering  of 
V  —  {j,  t}  is  a  topological  sort  of  Gfj,. 

Because  u  is  initially  the  head  of  the  list  L,  there  are  no  vertices  before  it  and 
so  there  are  none  before  it  with  excess  flow. 

Maintenance:  To  see  that  each  iteration  of  the  while  loop  maintains  the  topolog¬ 
ical  sort,  we  start  by  observing  that  the  admissible  network  is  changed  only  by 
push  and  relabel  operations.  By  Lemma  26.27,  push  operations  do  not  cause 
edges  to  become  admissible.  Thus,  only  relabel  operations  can  create  admissi¬ 
ble  edges.  After  a  vertex  u  is  relabeled,  however.  Lemma  26.28  states  that  there 
are  no  admissible  edges  entering  u  but  there  may  be  admissible  edges  leaving  u. 
Thus,  by  moving  u  to  the  front  of  L,  the  algorithm  ensures  that  any  admissible 
edges  leaving  u  satisfy  the  topological  sort  ordering. 
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To  see  that  no  vertex  preceding  u  in  L  has  excess  flow,  we  denote  the  vertex 
that  will  be  u  in  the  next  iteration  by  u' .  The  vertices  that  will  precede  u'  in  the 
next  iteration  include  the  current  u  (due  to  line  11)  and  either  no  other  vertices 
(if  u  is  relabeled)  or  the  same  vertices  as  before  (if  u  is  not  relabeled).  When  u 
is  discharged,  it  has  no  excess  flow  afterward.  Thus,  if  u  is  relabeled  during 
the  discharge,  no  vertices  preceding  u'  have  excess  flow.  If  u  is  not  relabeled 
during  the  discharge,  no  vertices  before  it  on  the  list  acquired  excess  flow  during 
this  discharge,  because  L  remained  topologically  sorted  at  all  times  during  the 
discharge  (as  just  pointed  out,  admissible  edges  are  created  only  by  relabeling, 
not  pushing),  and  so  each  push  operation  causes  excess  flow  to  move  only  to 
vertices  further  down  the  list  (or  to  s  or  t).  Again,  no  vertices  preceding  u'  have 
excess  flow. 

Termination:  When  the  loop  terminates,  u  is  just  past  the  end  of  L,  and  so  the 
loop  invariant  ensures  that  the  excess  of  every  vertex  is  0.  Thus,  no  basic  oper¬ 
ations  apply. 

Analysis 

We  shall  now  show  that  Relabel-To-Front  runs  in  G(F3)  time  on  any  flow 
network  G  =  (V,  E).  Since  the  algorithm  is  an  implementation  of  the  generic 
push-relabel  algorithm,  we  shall  take  advantage  of  Corollary  26.21,  which  pro¬ 
vides  an  0(V )  bound  on  the  number  of  relabel  operations  executed  per  vertex  and 
an  0(V2)  bound  on  the  total  number  of  relabel  operations  overall.  In  addition,  Ex¬ 
ercise  26.4-3  provides  an  O(VE)  bound  on  the  total  time  spent  performing  relabel 
operations,  and  Lemma  26.22  provides  an  0(VE )  bound  on  the  total  number  of 
saturating  push  operations. 

Theorem  26.30 

The  running  time  of  Relabel-To-Front  on  any  flow  network  G  =  (V,  E) 
is  0(F3). 

Proof  Let  us  consider  a  “phase”  of  the  relabel-to-front  algorithm  to  be  the  time 
between  two  consecutive  relabel  operations.  There  are  0(V2)  phases,  since  there 
are  0(V2)  relabel  operations.  Each  phase  consists  of  at  most  |F|  calls  to  DIS¬ 
CHARGE,  which  we  can  see  as  follows.  If  Discharge  does  not  perform  a  re¬ 
label  operation,  then  the  next  call  to  DISCHARGE  is  further  down  the  list  L,  and 
the  length  of  L  is  less  than  \  V\.  If  Discharge  does  perform  a  relabel,  the  next 
call  to  Discharge  belongs  to  a  different  phase.  Since  each  phase  contains  at 
most  \V\  calls  to  Discharge  and  there  are  0(VZ)  phases,  the  number  of  times 
Discharge  is  called  in  line  8  of  Relabel-To-Front  is  0(F3).  Thus,  the  total 
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work  performed  by  the  while  loop  in  Relabel-To-Front,  excluding  the  work 
performed  within  DISCHARGE,  is  at  most  0{V3). 

We  must  now  bound  the  work  performed  within  DISCHARGE  during  the  ex¬ 
ecution  of  the  algorithm.  Each  iteration  of  the  while  loop  within  DISCHARGE 
performs  one  of  three  actions.  We  shall  analyze  the  total  amount  of  work  involved 
in  performing  each  of  these  actions. 

We  start  with  relabel  operations  (lines  4-5).  Exercise  26.4-3  provides  an  O ( VE ) 
time  bound  on  all  the  0{V2)  relabels  that  are  performed. 

Now,  suppose  that  the  action  updates  the  u.  current  pointer  in  line  8.  This  action 
occurs  (9  (degree  (w))  times  each  time  a  vertex  u  is  relabeled,  and  0(V  ■  degree (w)) 
times  overall  for  the  vertex.  For  all  vertices,  therefore,  the  total  amount  of  work 
done  in  advancing  pointers  in  neighbor  lists  is  0(VE)  by  the  handshaking  lemma 
(Exercise  B.4-1). 

The  third  type  of  action  performed  by  DISCHARGE  is  a  push  operation  (line  7). 
We  already  know  that  the  total  number  of  saturating  push  operations  is  O(VE). 
Observe  that  if  a  nonsaturating  push  is  executed,  DISCHARGE  immediately  returns, 
since  the  push  reduces  the  excess  to  0.  Thus,  there  can  be  at  most  one  nonsaturating 
push  per  call  to  DISCHARGE.  As  we  have  observed,  DISCHARGE  is  called  0(V3) 
times,  and  thus  the  total  time  spent  performing  nonsaturating  pushes  is  0(V3). 

The  running  time  of  Relabel-To-Front  is  therefore  0(V3  +  VE),  which 
is  0(V3).  m 

Exercises 


26.5-1 

Illustrate  the  execution  of  Relabel-To-Front  in  the  manner  of  Figure  26. 10  for 
the  flow  network  in  Figure  26.1(a).  Assume  that  the  initial  ordering  of  vertices  in  L 
is  (vi,  V2.V3.V4)  and  that  the  neighbor  lists  are 

Vl  .N  =  (s,v 2,V3), 

V2.N  =  (s,  Vl,  v3,v4)  , 

v3.N  =  (Vl,  v2,  v4,t)  , 

V4.N  =  (v2,v3,t)  . 


26.5-2  * 

We  would  like  to  implement  a  push-relabel  algorithm  in  which  we  maintain  a  first- 
in,  first-out  queue  of  overflowing  vertices.  The  algorithm  repeatedly  discharges  the 
vertex  at  the  head  of  the  queue,  and  any  vertices  that  were  not  overflowing  before 
the  discharge  but  are  overflowing  afterward  are  placed  at  the  end  of  the  queue. 
After  the  vertex  at  the  head  of  the  queue  is  discharged,  it  is  removed.  When  the 
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queue  is  empty,  the  algorithm  terminates.  Show  how  to  implement  this  algorithm 
to  compute  a  maximum  flow  in  0(V3)  time. 


26.5- 3 

Show  that  the  generic  algorithm  still  works  if  Relabel  updates  u.h  by  sim¬ 
ply  computing  u.h  =  u.h  +  1.  How  would  this  change  affect  the  analysis  of 
Relabel-To-Front? 

26.5- 4  * 

Show  that  if  we  always  discharge  a  highest  overflowing  vertex,  we  can  make  the 
push-relabel  method  run  in  0{V3)  time. 


26.5-5 

Suppose  that  at  some  point  in  the  execution  of  a  push-relabel  algorithm,  there  exists 
an  integer  0  <  k  <  \V\  —  1  for  which  no  vertex  has  v.h  =  k.  Show  that  all 
vertices  with  v.h  >  k  are  on  the  source  side  of  a  minimum  cut.  If  such  a  k  exists, 
the  gap  heuristic  updates  every  vertex  v  €  V  —  {.?}  for  which  v.h  >  k,  to  set 
v.h  —  ma x(v.h,  \  V\  +  1).  Show  that  the  resulting  attribute  h  is  a  height  function. 
(The  gap  heuristic  is  crucial  in  making  implementations  of  the  push-relabel  method 
perform  well  in  practice.) 


Problems 


26-1  Escape  problem 

An  n  x  n  grid  is  an  undirected  graph  consisting  of  n  rows  and  n  columns  of  vertices, 
as  shown  in  Figure  26.11.  We  denote  the  vertex  in  the  z'th  row  and  the  j  th  column 
by  (J-  j )•  All  vertices  in  a  grid  have  exactly  four  neighbors,  except  for  the  boundary 
vertices,  which  are  the  points  (i ,  j )  for  which  i  =  1,  i  =  n,  j  =  1,  or  j  =  n. 

Given  m  <  n2  starting  points  (xi,  ji),  (x2,  y2) . (xm,  ym)  in  the  grid,  the 

escape  problem  is  to  determine  whether  or  not  there  are  m  vertex-disjoint  paths 
from  the  starting  points  to  any  m  different  points  on  the  boundary.  For  example, 
the  grid  in  Figure  26.11(a)  has  an  escape,  but  the  grid  in  Figure  26.11(b)  does  not. 

a.  Consider  a  flow  network  in  which  vertices,  as  well  as  edges,  have  capacities. 
That  is,  the  total  positive  flow  entering  any  given  vertex  is  subject  to  a  capacity 
constraint.  Show  that  determining  the  maximum  flow  in  a  network  with  edge 
and  vertex  capacities  can  be  reduced  to  an  ordinary  maximum-flow  problem  on 
a  flow  network  of  comparable  size. 
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Figure  26.11  Grids  for  the  escape  problem.  Starting  points  are  black,  and  other  grid  vertices  are 
white,  (a)  A  grid  with  an  escape,  shown  by  shaded  paths,  (b)  A  grid  with  no  escape. 

b.  Describe  an  efficient  algorithm  to  solve  the  escape  problem,  and  analyze  its 
running  time. 

26-2  Minimum  path  cover 

A  path  cover  of  a  directed  graph  G  =  {V,  E)  is  a  set  P  of  vertex -disjoint  paths 
such  that  every  vertex  in  V  is  included  in  exactly  one  path  in  P .  Paths  may  start 
and  end  anywhere,  and  they  may  be  of  any  length,  including  0.  A  minimum  path 
cover  of  G  is  a  path  cover  containing  the  fewest  possible  paths. 

a.  Give  an  efficient  algorithm  to  find  a  minimum  path  cover  of  a  directed  acyclic 
graph  G  =  (V,  E).  (Hint:  Assuming  that  V  =  {1,2 construct  the 
graph  G'  =  (F\  E'),  where 

V'  =  . xfl}U{y0,>,i, •••,>'/.}  . 

E'  =  {(*<),■*,)  :  *  €  V}  U  {CVi,y0) :  i  €  V}  U  {(x<,yy) :  (i,y)  e  E}  , 
and  run  a  maximum-flow  algorithm.) 

b.  Does  your  algorithm  work  for  directed  graphs  that  contain  cycles?  Explain. 
26-3  Algorithmic  consulting 

Professor  Gore  wants  to  open  up  an  algorithmic  consulting  company.  He  has  iden¬ 
tified  n  important  subareas  of  algorithms  (roughly  corresponding  to  different  por¬ 
tions  of  this  textbook),  which  he  represents  by  the  set  A  =  {Ai,  A2, . ..  ,A„}.  In 
each  subarea  A *,  he  can  hire  an  expert  in  that  area  for  ck  dollars.  The  consulting 
company  has  lined  up  a  set  J  =  {Jl,  J2, . .  ■ ,  J,„ }  of  potential  jobs.  In  order  to 
perform  job  /, ,  the  company  needs  to  have  hired  experts  in  a  subset  R,  C  A  of 
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subareas.  Each  expert  can  work  on  multiple  jobs  simultaneously.  If  the  company 
chooses  to  accept  job  /, ,  it  must  have  hired  experts  in  all  subareas  in  R, ,  and  it  will 
take  in  revenue  of  p,  dollars. 

Professor  Gore’s  job  is  to  determine  which  subareas  to  hire  experts  in  and  which 
jobs  to  accept  in  order  to  maximize  the  net  revenue,  which  is  the  total  income  from 
jobs  accepted  minus  the  total  cost  of  employing  the  experts. 

Consider  the  following  flow  network  G.  It  contains  a  source  vertex  s,  vertices 
Ai,  A2, . . . ,  An,  vertices  J\,  J2, . . . ,  Jm,  and  a  sink  vertex  t.  For  k  =  1,2...,/?, 
the  flow  network  contains  an  edge  (s,  A/t)  with  capacity  c(s,Ak)  =  ck,  and 
for  i  =  1, 2, ...,/??,  the  flow  network  contains  an  edge  (/,,  t)  with  capacity 
c{Jj,t)  =  Pi-  Fork  =  1, 2, ...,/?  and /'  =  1, 2, . . . ,  m,  if  Ak  e  Rh  then  G 
contains  an  edge  (Ak,  /,)  with  capacity  c(Ak ,  /,)  =  oo. 

a.  Show  that  if  /,  e  T  for  a  finite-capacity  cut  ( S ,  T)  of  G,  then  Ak  e  T  for  each 
Ak  €  Rj. 

b.  Show  how  to  determine  the  maximum  net  revenue  from  the  capacity  of  a  mini¬ 
mum  cut  of  G  and  the  given  pt  values. 

c.  Give  an  efficient  algorithm  to  determine  which  jobs  to  accept  and  which  experts 
to  hire.  Analyze  the  running  time  of  your  algorithm  in  terms  of  /??,  /?,  and 

'  =  Er-ii*»i- 

26-4  Updating  maximum  flow 

Fet  G  =  (V,  E )  be  a  flow  network  with  source  s,  sink  t,  and  integer  capacities. 
Suppose  that  we  are  given  a  maximum  flow  in  G. 

a.  Suppose  that  we  increase  the  capacity  of  a  single  edge  (w,  v)  e  E  by  1.  Give 
an  0(V  +  E )-ti me  algorithm  to  update  the  maximum  flow. 

b.  Suppose  that  we  decrease  the  capacity  of  a  single  edge  (u,v)  €  E  by  1.  Give 
an  0(V  +  F)-time  algorithm  to  update  the  maximum  flow. 

26-5  Maximum  flow  by  scaling 

Fet  G  =  (V,  E)  be  a  flow  network  with  source  s,  sink  t,  and  an  integer  capac¬ 
ity  c{u ,  v)  on  each  edge  ( u ,  v)  €  E.  Fet  C  =  maX(U!V)S£  c(u ,  v). 

a.  Argue  that  a  minimum  cut  of  G  has  capacity  at  most  C  \  E\. 

b.  For  a  given  number  K,  show  how  to  find  an  augmenting  path  of  capacity  at 
least  K  in  0(E)  time,  if  such  a  path  exists. 
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We  can  use  the  following  modification  of  Ford-Fulkerson-Method  to  com¬ 
pute  a  maximum  flow  in  G: 

Max-Flow-By-Scaling (G,  s,  t) 

1  C  =  max(!<!V)e£  c(u,  v) 

2  initialize  flow  /  to  0 

3  K  =  2li®cj 

4  while  K  >  1 

5  while  there  exists  an  augmenting  path  p  of  capacity  at  least  K 

6  augment  flow  /  along  p 

7  K  =  K/2 

8  return  / 

c.  Argue  that  Max-Flow-By-Scaling  returns  a  maximum  flow. 

d.  Show  that  the  capacity  of  a  minimum  cut  of  the  residual  network  G/  is  at  most 
2 K  \  E\  each  time  line  4  is  executed. 

e.  Argue  that  the  inner  while  loop  of  lines  5-6  executes  0(E)  times  for  each  value 
of  K. 

f.  Conclude  that  Max-Flow-By-Scaling  can  be  implemented  so  that  it  runs 
in  0(E2  lgC)  time. 

26-6  The  Hopcroft-Karp  bipartite  matching  algorithm 

In  this  problem,  we  describe  a  faster  algorithm,  due  to  Hopcroft  and  Karp,  for 
finding  a  maximum  matching  in  a  bipartite  graph.  The  algorithm  runs  in  0(\[V E) 
time.  Given  an  undirected,  bipartite  graph  G  =  (V,  E),  where  V  =  L  U  R  and 
all  edges  have  exactly  one  endpoint  in  L,  let  M  be  a  matching  in  G.  We  say  that 
a  simple  path  P  in  G  is  an  augmenting  path  with  respect  to  M  if  it  starts  at  an 
unmatched  vertex  in  L,  ends  at  an  unmatched  vertex  in  R,  and  its  edges  belong 
alternately  to  M  and  E  —  M .  (This  definition  of  an  augmenting  path  is  related 
to,  but  different  from,  an  augmenting  path  in  a  flow  network.)  In  this  problem, 
we  treat  a  path  as  a  sequence  of  edges,  rather  than  as  a  sequence  of  vertices.  A 
shortest  augmenting  path  with  respect  to  a  matching  M  is  an  augmenting  path 
with  a  minimum  number  of  edges. 

Given  two  sets  A  and  B,  the  symmetric  difference  A  ©  B  is  defined  as  (A  —  B)  U 
(B  —  A),  that  is,  the  elements  that  are  in  exactly  one  of  the  two  sets. 
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a.  Show  that  if  M  is  a  matching  and  P  is  an  augmenting  path  with  respect  to  M, 
then  the  symmetric  difference  M  ©  P  is  a  matching  and  \M  ©  P  |  =  \M\  +  1. 

Show  that  if  Pi,  P2 . P/f  are  vertex-disjoint  augmenting  paths  with  respect 

to  M,  then  the  symmetric  difference  M  ©  ( P|  U  P2  U  ■  ■  ■  U  Pk )  is  a  matching 
with  cardinality  \M\  +  k. 

The  general  structure  of  our  algorithm  is  the  following: 

Hopcroft-Karp(G) 

1  M  =  0 

2  repeat 

3  let  P  =  {Pi,  P2, . . . ,  Pk}  be  a  maximal  set  of  vertex-disjoint 

shortest  augmenting  paths  with  respect  to  M 

4  M  =  M  ©  (Pi  U  P2  U  ■  •  •  U  PO 

5  until  P  ==  0 

6  return  M 

The  remainder  of  this  problem  asks  you  to  analyze  the  number  of  iterations  in 
the  algorithm  (that  is,  the  number  of  iterations  in  the  repeat  loop)  and  to  describe 
an  implementation  of  line  3. 

b.  Given  two  matchings  M  and  M*  in  G,  show  that  every  vertex  in  the  graph 
G'  =  (V,  M  ©  M*)  has  degree  at  most  2.  Conclude  that  G'  is  a  disjoint 
union  of  simple  paths  or  cycles.  Argue  that  edges  in  each  such  simple  path 
or  cycle  belong  alternately  to  M  or  M*.  Prove  that  if  \M\  <  \M*\,  then 
M  ©  M*  contains  at  least  \M*\  —  \M\  vertex-disjoint  augmenting  paths  with 
respect  to  M . 

Let  l  be  the  length  of  a  shortest  augmenting  path  with  respect  to  a  matching  M ,  and 

let  Pi,  P2 . be  a  maximal  set  of  vertex-disjoint  augmenting  paths  of  length  / 

with  respect  to  M .  Let  M'  =  M  ©  (Px  U  ■  ■  ■  U  Pk),  and  suppose  that  P  is  a  shortest 
augmenting  path  with  respect  to  M' . 

c.  Show  that  if  P  is  vertex-disjoint  from  P, .  P2, . . . ,  Pk,  then  P  has  more  than  I 
edges. 

d.  Now  suppose  that  P  is  not  vertex-disjoint  from  Px,  P2, . . . ,  P^.  Let  A  be  the 
set  of  edges  ( M  ©  M')  ©  P.  Show  that  A  =  (  Pi  U  P2  U  •  ■  •  U  Pk)  ©  P  and 
that  \A\  >  (k  +  1)/.  Conclude  that  P  has  more  than  /  edges. 

e.  Prove  that  if  a  shortest  augmenting  path  with  respect  to  M  has  /  edges,  the  size 
of  the  maximum  matching  is  at  most  \M\  +  \  V\  /(/  +  1). 
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f.  Show  that  the  number  of  repeat  loop  iterations  in  the  algorithm  is  at 
most  2y/\V\.  (Hint:  By  how  much  can  M  grow  after  iteration  number  y/\V\2) 

g.  Give  an  algorithm  that  runs  in  0(E)  time  to  find  a  maximal  set  of  vertex- 
disjoint  shortest  augmenting  paths  Pi,  P2, . . . ,  Pk  for  a  given  matching  M . 
Conclude  that  the  total  running  time  of  HOPCROFT-KARP  is  0(  W E). 


Chapter  notes 

Ahuja,  Magnanti,  and  Orlin  [7],  Even  [103],  Lawler  [224],  Papadimitriou  and  Stei- 
glitz  [271],  and  Tarjan  [330]  are  good  references  for  network  flow  and  related  algo¬ 
rithms.  Goldberg,  Tardos,  and  Tarjan  [139]  also  provide  a  nice  survey  of  algorithms 
for  network-flow  problems,  and  Schrijver  [304]  has  written  an  interesting  review 
of  historical  developments  in  the  field  of  network  flows. 

The  Ford-Fulkerson  method  is  due  to  Ford  and  Fulkerson  [109],  who  originated 
the  formal  study  of  many  of  the  problems  in  the  area  of  network  flow,  including 
the  maximum-flow  and  bipartite-matching  problems.  Many  early  implementations 
of  the  Ford-Fulkerson  method  found  augmenting  paths  using  breadth-first  search; 
Edmonds  and  Kaip  [102],  and  independently  Dinic  [89],  proved  that  this  strategy 
yields  a  polynomial-time  algorithm.  A  related  idea,  that  of  using  “blocking  flows,” 
was  also  first  developed  by  Dinic  [89].  Karzanov  [202]  first  developed  the  idea  of 
preflows.  The  push-relabel  method  is  due  to  Goldberg  [136]  and  Goldberg  and  Tar¬ 
jan  [140].  Goldberg  and  Tarjan  gave  an  0(F3)-time  algorithm  that  uses  a  queue  to 
maintain  the  set  of  overflowing  vertices,  as  well  as  an  algorithm  that  uses  dynamic 
trees  to  achieve  a  running  time  of  0(VE  lg(K2/ E  +  2)).  Several  other  researchers 
have  developed  push-relabel  maximum-flow  algorithms.  Ahuja  and  Orlin  [9]  and 
Ahuja,  Orlin,  and  Tarjan  [10]  gave  algorithms  that  used  scaling.  Cheriyan  and 
Maheshwari  [62]  proposed  pushing  flow  from  the  overflowing  vertex  of  maximum 
height.  Cheriyan  and  Hagerup  [61]  suggested  randomly  permuting  the  neighbor 
lists,  and  several  researchers  [14,  204,  276]  developed  clever  derandomizations  of 
this  idea,  leading  to  a  sequence  of  faster  algorithms.  The  algorithm  of  King,  Rao, 
and  Tarjan  [204]  is  the  fastest  such  algorithm  and  runs  in  0(VE  log£/(F [g  V)  V) 
time. 

The  asymptotically  fastest  algorithm  to  date  for  the  maximum-flow  problem,  by 
Goldberg  and  Rao  [138],  runs  in  time  <9(min(F2/3,  El^2)E  lg(F2/ E  +  2)  lg  C), 
where  C  =  max(, hV)eE  c(u,v).  This  algorithm  does  not  use  the  push-relabel 
method  but  instead  is  based  on  finding  blocking  flows.  All  previous  maximum- 
flow  algorithms,  including  the  ones  in  this  chapter,  use  some  notion  of  distance 
(the  push-relabel  algorithms  use  the  analogous  notion  of  height),  with  a  length  of  1 
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assigned  implicitly  to  each  edge.  This  new  algorithm  takes  a  different  approach  and 
assigns  a  length  of  0  to  high-capacity  edges  and  a  length  of  1  to  low-capacity  edges. 
Informally,  with  respect  to  these  lengths,  shortest  paths  from  the  source  to  the  sink 
tend  have  high  capacity,  which  means  that  fewer  iterations  need  be  performed. 

In  practice,  push-relabel  algorithms  currently  dominate  augmenting-path  or 
linear-programming  based  algorithms  for  the  maximum-flow  problem.  A  study 
by  Cherkassky  and  Goldberg  [63]  underscores  the  importance  of  using  two  heuris¬ 
tics  when  implementing  a  push-relabel  algorithm.  The  first  heuristic  is  to  peri¬ 
odically  perform  a  breadth-first  search  of  the  residual  network  in  order  to  obtain 
more  accurate  height  values.  The  second  heuristic  is  the  gap  heuristic,  described  in 
Exercise  26.5-5.  Cherkassky  and  Goldberg  conclude  that  the  best  choice  of  push- 
relabel  valiants  is  the  one  that  chooses  to  discharge  the  overflowing  vertex  with  the 
maximum  height. 

The  best  algorithm  to  date  for  maximum  bipartite  matching,  discovered  by 
Hopcroft  and  Karp  [176],  runs  in  0(VV E)  time  and  is  described  in  Problem  26-6. 
The  book  by  Lovasz  and  Plummer  [239]  is  an  excellent  reference  on  matching 
problems. 
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Introduction 


This  part  contains  a  selection  of  algorithmic  topics  that  extend  and  complement 
earlier  material  in  this  book.  Some  chapters  introduce  new  models  of  computation 
such  as  circuits  or  parallel  computers.  Others  cover  specialized  domains  such  as 
computational  geometry  or  number  theory.  The  last  two  chapters  discuss  some  of 
the  known  limitations  to  the  design  of  efficient  algorithms  and  introduce  techniques 
for  coping  with  those  limitations. 

Chapter  27  presents  an  algorithmic  model  for  parallel  computing  based  on  dy¬ 
namic  multithreading.  The  chapter  introduces  the  basics  of  the  model,  showing 
how  to  quantify  parallelism  in  terms  of  the  measures  of  work  and  span.  It  then 
investigates  several  interesting  multithreaded  algorithms,  including  algorithms  for 
matrix  multiplication  and  merge  sorting. 

Chapter  28  studies  efficient  algorithms  for  operating  on  matrices.  It  presents 
two  general  methods— LU  decomposition  and  LUP  decomposition— for  solving 
linear  equations  by  Gaussian  elimination  in  0(n3)  time.  It  also  shows  that  matrix 
inversion  and  matrix  multiplication  can  be  performed  equally  fast.  The  chapter 
concludes  by  showing  how  to  compute  a  least-squares  approximate  solution  when 
a  set  of  lineal-  equations  has  no  exact  solution. 

Chapter  29  studies  linear  programming,  in  which  we  wish  to  maximize  or  mini¬ 
mize  an  objective,  given  limited  resources  and  competing  constraints.  Linear  pro¬ 
gramming  arises  in  a  variety  of  practical  application  areas.  This  chapter  covers  how 
to  formulate  and  solve  linear  programs.  The  solution  method  covered  is  the  sim¬ 
plex  algorithm,  which  is  the  oldest  algorithm  for  linear  programming.  In  contrast 
to  many  algorithms  in  this  book,  the  simplex  algorithm  does  not  run  in  polynomial 
time  in  the  worst  case,  but  it  is  fairly  efficient  and  widely  used  in  practice. 
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Chapter  30  studies  operations  on  polynomials  and  shows  how  to  use  a  well- 
known  signal -processing  technique— the  fast  Fourier  transform  (FFT)— to  multi¬ 
ply  two  degree-//  polynomials  in  0(n  Ig //)  time.  It  also  investigates  efficient  im¬ 
plementations  of  the  FFT,  including  a  parallel  circuit. 

Chapter  31  presents  number-theoretic  algorithms.  After  reviewing  elementary 
number  theory,  it  presents  Euclid’s  algorithm  for  computing  greatest  common  di¬ 
visors.  Next,  it  studies  algorithms  for  solving  modular  linear  equations  and  for 
raising  one  number  to  a  power  modulo  another  number.  Then,  it  explores  an  impor¬ 
tant  application  of  number-theoretic  algorithms:  the  RSA  public-key  cryptosystem. 
This  cryptosystem  can  be  used  not  only  to  encrypt  messages  so  that  an  adversary 
cannot  read  them,  but  also  to  provide  digital  signatures.  The  chapter  then  presents 
the  Miller-Rabin  randomized  primality  test,  with  which  we  can  find  large  primes 
efficiently— an  essential  requirement  for  the  RSA  system.  Finally,  the  chapter  cov¬ 
ers  Pollard’s  “rho”  heuristic  for  factoring  integers  and  discusses  the  state  of  the  art 
of  integer  factorization. 

Chapter  32  studies  the  problem  of  finding  all  occurrences  of  a  given  pattern 
string  in  a  given  text  string,  a  problem  that  arises  frequently  in  text-editing  pro¬ 
grams.  After  examining  the  naive  approach,  the  chapter  presents  an  elegant  ap¬ 
proach  due  to  Rabin  and  Karp.  Then,  after  showing  an  efficient  solution  based 
on  finite  automata,  the  chapter  presents  the  Knuth-Morris-Pratt  algorithm,  which 
modifies  the  automaton-based  algorithm  to  save  space  by  cleverly  preprocessing 
the  pattern. 

Chapter  33  considers  a  few  problems  in  computational  geometry.  After  dis¬ 
cussing  basic  primitives  of  computational  geometry,  the  chapter  shows  how  to  use 
a  “sweeping”  method  to  efficiently  determine  whether  a  set  of  line  segments  con¬ 
tains  any  intersections.  Two  clever  algorithms  for  finding  the  convex  hull  of  a  set  of 
points— Graham’s  scan  and  Jarvis’s  march— also  illustrate  the  power  of  sweeping 
methods.  The  chapter  closes  with  an  efficient  algorithm  for  finding  the  closest  pair 
from  among  a  given  set  of  points  in  the  plane. 

Chapter  34  concerns  NP-complete  problems.  Many  interesting  computational 
problems  are  NP-complete,  but  no  polynomial-time  algorithm  is  known  for  solving 
any  of  them.  This  chapter  presents  techniques  for  determining  when  a  problem  is 
NP-complete.  Several  classic  problems  are  proved  to  be  NP-complete:  determining 
whether  a  graph  has  a  hamiltonian  cycle,  determining  whether  a  boolean  formula 
is  satisfiable,  and  determining  whether  a  given  set  of  numbers  has  a  subset  that 
adds  up  to  a  given  target  value.  The  chapter  also  proves  that  the  famous  traveling- 
salesman  problem  is  NP-complete. 

Chapter  35  shows  how  to  find  approximate  solutions  to  NP-complete  problems 
efficiently  by  using  approximation  algorithms.  For  some  NP-complete  problems, 
approximate  solutions  that  are  near  optimal  are  quite  easy  to  produce,  but  for  others 
even  the  best  approximation  algorithms  known  work  progressively  more  poorly  as 
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the  problem  size  increases.  Then,  there  are  some  problems  for  which  we  can  invest 
increasing  amounts  of  computation  time  in  return  for  increasingly  better  approx¬ 
imate  solutions.  This  chapter  illustrates  these  possibilities  with  the  vertex-cover 
problem  (unweighted  and  weighted  versions),  an  optimization  version  of  3-CNF 
satisfiability,  the  traveling-salesman  problem,  the  set-covering  problem,  and  the 
subset-sum  problem. 
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The  vast  majority  of  algorithms  in  this  book  are  serial  algorithms  suitable  for 
running  on  a  uniprocessor  computer  in  which  only  one  instruction  executes  at  a 
time.  In  this  chapter,  we  shall  extend  our  algorithmic  model  to  encompass  parallel 
algorithms ,  which  can  run  on  a  multiprocessor  computer  that  permits  multiple 
instructions  to  execute  concurrently.  In  particular,  we  shall  explore  the  elegant 
model  of  dynamic  multithreaded  algorithms,  which  are  amenable  to  algorithmic 
design  and  analysis,  as  well  as  to  efficient  implementation  in  practice. 

Parallel  computers— computers  with  multiple  processing  units— have  become 
increasingly  common,  and  they  span  a  wide  range  of  prices  and  performance.  Rela¬ 
tively  inexpensive  desktop  and  laptop  chip  multiprocessors  contain  a  single  multi¬ 
core  integrated-circuit  chip  that  houses  multiple  processing  “cores,”  each  of  which 
is  a  full-fledged  processor  that  can  access  a  common  memory.  At  an  intermedi¬ 
ate  price/performance  point  are  clusters  built  from  individual  computers— often 
simple  PC-class  machines— with  a  dedicated  network  interconnecting  them.  The 
highest-priced  machines  are  supercomputers,  which  often  use  a  combination  of 
custom  architectures  and  custom  networks  to  deliver  the  highest  performance  in 
terms  of  instructions  executed  per  second. 

Multiprocessor  computers  have  been  around,  in  one  form  or  another,  for 
decades.  Although  the  computing  community  settled  on  the  random-access  ma¬ 
chine  model  for  serial  computing  early  on  in  the  history  of  computer  science,  no 
single  model  for  parallel  computing  has  gained  as  wide  acceptance.  A  major  rea¬ 
son  is  that  vendors  have  not  agreed  on  a  single  architectural  model  for  parallel 
computers.  For  example,  some  parallel  computers  feature  shared  memory ,  where 
each  processor  can  directly  access  any  location  of  memory.  Other  parallel  com¬ 
puters  employ  distributed  memory ,  where  each  processor’s  memory  is  private,  and 
an  explicit  message  must  be  sent  between  processors  in  order  for  one  processor  to 
access  the  memory  of  another.  With  the  advent  of  multicore  technology,  however, 
every  new  laptop  and  desktop  machine  is  now  a  shared-memory  parallel  computer, 
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and  the  trend  appeal's  to  be  toward  shared-memory  multiprocessing.  Although  time 
will  tell,  that  is  the  approach  we  shall  take  in  this  chapter. 

One  common  means  of  programming  chip  multiprocessors  and  other  shared- 
memory  parallel  computers  is  by  using  static  threading ,  which  provides  a  software 
abstraction  of  “virtual  processors,”  or  threads ,  sharing  a  common  memory.  Each 
thread  maintains  an  associated  program  counter  and  can  execute  code  indepen¬ 
dently  of  the  other  threads.  The  operating  system  loads  a  thread  onto  a  processor 
for  execution  and  switches  it  out  when  another  thread  needs  to  run.  Although  the 
operating  system  allows  programmers  to  create  and  destroy  threads,  these  opera¬ 
tions  are  comparatively  slow.  Thus,  for  most  applications,  threads  persist  for  the 
duration  of  a  computation,  which  is  why  we  call  them  “static.” 

Unfortunately,  programming  a  shared-memory  parallel  computer  directly  using 
static  threads  is  difficult  and  error-prone.  One  reason  is  that  dynamically  parti¬ 
tioning  the  work  among  the  threads  so  that  each  thread  receives  approximately 
the  same  load  turns  out  to  be  a  complicated  undertaking.  For  any  but  the  sim¬ 
plest  of  applications,  the  programmer  must  use  complex  communication  protocols 
to  implement  a  scheduler  to  load-balance  the  work.  This  state  of  affairs  has  led 
toward  the  creation  of  concurrency  platforms,  which  provide  a  layer  of  software 
that  coordinates,  schedules,  and  manages  the  parallel-computing  resources.  Some 
concurrency  platforms  are  built  as  runtime  libraries,  but  others  provide  full-fledged 
parallel  languages  with  compiler  and  runtime  support. 

Dynamic  multithreaded  programming 

One  important  class  of  concurrency  platform  is  dynamic  multithreading ,  which  is 
the  model  we  shall  adopt  in  this  chapter.  Dynamic  multithreading  allows  program¬ 
mers  to  specify  parallelism  in  applications  without  worrying  about  communication 
protocols,  load  balancing,  and  other  vagaries  of  static-thread  programming.  The 
concurrency  platform  contains  a  scheduler,  which  load-balances  the  computation 
automatically,  thereby  greatly  simplifying  the  programmer’s  chore.  Although  the 
functionality  of  dynamic-multithreading  environments  is  still  evolving,  almost  all 
support  two  features:  nested  parallelism  and  parallel  loops.  Nested  parallelism 
allows  a  subroutine  to  be  “spawned,”  allowing  the  caller  to  proceed  while  the 
spawned  subroutine  is  computing  its  result.  A  parallel  loop  is  like  an  ordinary 
for  loop,  except  that  the  iterations  of  the  loop  can  execute  concurrently. 

These  two  features  form  the  basis  of  the  model  for  dynamic  multithreading  that 
we  shall  study  in  this  chapter.  A  key  aspect  of  this  model  is  that  the  programmer 
needs  to  specify  only  the  logical  parallelism  within  a  computation,  and  the  threads 
within  the  underlying  concurrency  platform  schedule  and  load-balance  the  compu¬ 
tation  among  themselves.  We  shall  investigate  multithreaded  algorithms  written  for 


774 


Chapter  27  Multithreaded  Algorithms 


this  model,  as  well  how  the  underlying  concurrency  platform  can  schedule  compu¬ 
tations  efficiently. 

Our  model  for  dynamic  multithreading  offers  several  important  advantages: 

•  It  is  a  simple  extension  of  our  serial  programming  model.  We  can  describe  a 
multithreaded  algorithm  by  adding  to  our  pseudocode  just  three  “concurrency” 
keywords:  parallel,  spawn,  and  sync.  Moreover,  if  we  delete  these  concur¬ 
rency  keywords  from  the  multithreaded  pseudocode,  the  resulting  text  is  serial 
pseudocode  for  the  same  problem,  which  we  call  the  “serialization”  of  the  mul¬ 
tithreaded  algorithm. 

•  It  provides  a  theoretically  clean  way  to  quantify  parallelism  based  on  the  no¬ 
tions  of  “work”  and  “span.” 

•  Many  multithreaded  algorithms  involving  nested  parallelism  follow  naturally 
from  the  divide-and-conquer  paradigm.  Moreover,  just  as  serial  divide-and- 
conquer  algorithms  lend  themselves  to  analysis  by  solving  recurrences,  so  do 
multithreaded  algorithms. 

•  The  model  is  faithful  to  how  parallel-computing  practice  is  evolving.  A  grow¬ 
ing  number  of  concurrency  platforms  support  one  variant  or  another  of  dynamic 
multithreading,  including  Cilk  [51,  118],  Cilk++  [71],  OpenMP  [59],  Task  Par¬ 
allel  Library  [230],  and  Threading  Building  Blocks  [292]. 

Section  27.1  introduces  the  dynamic  multithreading  model  and  presents  the  met¬ 
rics  of  work,  span,  and  parallelism,  which  we  shall  use  to  analyze  multithreaded 
algorithms.  Section  27.2  investigates  how  to  multiply  matrices  with  multithread¬ 
ing,  and  Section  27.3  tackles  the  tougher  problem  of  multithreading  merge  sort. 


27.1  The  basics  of  dynamic  multithreading 

We  shall  begin  our  exploration  of  dynamic  multithreading  using  the  example  of 
computing  Fibonacci  numbers  recursively.  Recall  that  the  Fibonacci  numbers  are 
defined  by  recurrence  (3.22): 

F0  =  0, 

Ft  =  1, 

Fj  =  Fi-i  +  F i—2  for  /  >  2  . 

Here  is  a  simple,  recursive,  serial  algorithm  to  compute  the  /2th  Fibonacci  number: 
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(Fib(4) 


(F»»(2))  Cf^qT)  (F1B(^(FIB(D)  (Fib(1)  )  (FlB(O)) 


(Fib(2>)  (Fib(1))  (Fib(1))  (Fib(O))  (Fib(1))  (Fib(O))  (Fib(1))  (Fib(0)) 


(Fib(1)  )  (  Fib (0)  ) 


Figure  27.1  The  tree  of  recursive  procedure  instances  when  computing  Fib  (6).  Each  instance  of 
Fib  with  the  same  argument  does  the  same  work  to  produce  the  same  result,  providing  an  inefficient 
but  interesting  way  to  compute  Fibonacci  numbers. 


FlB(fl) 

1  if  n  <  1 

2  return  n 

3  else  x  =  Fib(w  —  1) 

4  y  =  Fib(«  -2) 

5  return  x  +  y 

You  would  not  really  want  to  compute  large  Fibonacci  numbers  this  way,  be¬ 
cause  this  computation  does  much  repeated  work.  Figure  27.1  shows  the  tree  of 
recursive  procedure  instances  that  are  created  when  computing  F6.  For  example, 
a  call  to  Fib(6)  recursively  calls  Fib(5)  and  then  Fib(4).  But,  the  call  to  Fib(5) 
also  results  in  a  call  to  Fib (4).  Both  instances  of  Fib (4)  return  the  same  result 
(F4  =  3).  Since  the  Fib  procedure  does  not  memoize,  the  second  call  to  FlB(4) 
replicates  the  work  that  the  first  call  performs. 

Let  T(n )  denote  the  running  time  of  FlB(n).  Since  Fib(«)  contains  two  recur¬ 
sive  calls  plus  a  constant  amount  of  extra  work,  we  obtain  the  recurrence 

T(n)  =  T(n  -  1)  +  T(n  -2)  +  0(1)  . 

This  recurrence  has  solution  T(n)  =  0(F„),  which  we  can  show  using  the  substi¬ 
tution  method.  For  an  inductive  hypothesis,  assume  that  T{n)  <  aF„  —  b,  where 
a  >  1  and  b  >  0  are  constants.  Substituting,  we  obtain 
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T(n )  <  (aFn^-b)  +  (aFn_2-b)  +  &(  1) 

=  a(Fn~  i  +  Fn_  2)  —  2b  +  0(1) 

=  aFn  -  b  -  (b  -  0(1)) 

<  a  Fn  —  b 

if  we  choose  b  large  enough  to  dominate  the  constant  in  the  0(1).  We  can  then 
choose  a  large  enough  to  satisfy  the  initial  condition.  The  analytical  bound 

T{n)  =  ©(</»")  ,  (27.1) 

where  0  =  (1  +  \/5)/2  is  the  golden  ratio,  now  follows  from  equation  (3.25). 
Since  Fn  grows  exponentially  in  n,  this  procedure  is  a  particularly  slow  way  to 
compute  Fibonacci  numbers.  (See  Problem  31-3  for  much  faster  ways.) 

Although  the  Fib  procedure  is  a  poor  way  to  compute  Fibonacci  numbers,  it 
makes  a  good  example  for  illustrating  key  concepts  in  the  analysis  of  multithreaded 
algorithms.  Observe  that  within  Flli(»),  the  two  recursive  calls  in  lines  3  and  4  to 
Fib(/;  —  1)  and  Fib(»  —  2),  respectively,  are  independent  of  each  other:  they  could 
be  called  in  either  order,  and  the  computation  performed  by  one  in  no  way  affects 
the  other.  Therefore,  the  two  recursive  calls  can  run  in  parallel. 

We  augment  our  pseudocode  to  indicate  parallelism  by  adding  the  concurrency 
keywords  spawn  and  sync.  Here  is  how  we  can  rewrite  the  Fib  procedure  to  use 
dynamic  multithreading: 

P-Fib («) 

1  if  n  <  1 

2  return  n 

3  else  x  =  spawn  P-Fib  (n  -  1 ) 

4  y  =  P-FlB  (n  -  2) 

5  sync 

6  return  x  +  y 

Notice  that  if  we  delete  the  concurrency  keywords  spawn  and  sync  from  P-Fib, 
the  resulting  pseudocode  text  is  identical  to  Fib  (other  than  renaming  the  procedure 
in  the  header  and  in  the  two  recursive  calls).  We  define  the  serialization  of  a  mul¬ 
tithreaded  algorithm  to  be  the  serial  algorithm  that  results  from  deleting  the  multi¬ 
threaded  keywords:  spawn,  sync,  and  when  we  examine  parallel  loops,  parallel. 
Indeed,  our  multithreaded  pseudocode  has  the  nice  property  that  a  serialization  is 
always  ordinary  serial  pseudocode  to  solve  the  same  problem. 

Nested  parallelism  occurs  when  the  keyword  spawn  precedes  a  procedure  call, 
as  in  line  3.  The  semantics  of  a  spawn  differs  from  an  ordinary  procedure  call  in 
that  the  procedure  instance  that  executes  the  spawn— the  parent— may  continue 
to  execute  in  parallel  with  the  spawned  subroutine— its  child— instead  of  waiting 
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for  the  child  to  complete,  as  would  normally  happen  in  a  serial  execution.  In  this 
case,  while  the  spawned  child  is  computing  P-Fib(«  —  1),  the  parent  may  go  on 
to  compute  P-Fib  (»  —  2)  in  line  4  in  parallel  with  the  spawned  child.  Since  the 
P-Fib  procedure  is  recursive,  these  two  subroutine  calls  themselves  create  nested 
parallelism,  as  do  then-  children,  thereby  creating  a  potentially  vast  tree  of  subcom¬ 
putations,  all  executing  in  parallel. 

The  keyword  spawn  does  not  say,  however,  that  a  procedure  must  execute  con¬ 
currently  with  its  spawned  children,  only  that  it  may.  The  concurrency  keywords 
express  the  logical  parallelism  of  the  computation,  indicating  which  parts  of  the 
computation  may  proceed  in  parallel.  At  runtime,  it  is  up  to  a  scheduler  to  deter¬ 
mine  which  subcomputations  actually  run  concurrently  by  assigning  them  to  avail¬ 
able  processors  as  the  computation  unfolds.  We  shall  discuss  the  theory  behind 
schedulers  shortly. 

A  procedure  cannot  safely  use  the  values  returned  by  its  spawned  children  until 
after  it  executes  a  sync  statement,  as  in  line  5.  The  keyword  sync  indicates  that 
the  procedure  must  wait  as  necessary  for  all  its  spawned  children  to  complete  be¬ 
fore  proceeding  to  the  statement  after  the  sync.  In  the  P-Fib  procedure,  a  sync 
is  required  before  the  return  statement  in  line  6  to  avoid  the  anomaly  that  would 
occur  if  x  and  y  were  summed  before  x  was  computed.  In  addition  to  explicit 
synchronization  provided  by  the  sync  statement,  every  procedure  executes  a  sync 
implicitly  before  it  returns,  thus  ensuring  that  all  its  children  terminate  before  it 
does. 

A  model  for  multithreaded  execution 

It  helps  to  think  of  a  multithreaded  computation— the  set  of  runtime  instruc¬ 
tions  executed  by  a  processor  on  behalf  of  a  multithreaded  program— as  a  directed 
acyclic  graph  G  =  (V,  E ),  called  a  computation  dag.  As  an  example,  Figure  27.2 
shows  the  computation  dag  that  results  from  computing  P-Fib (4).  Conceptually, 
the  vertices  in  V  are  instructions,  and  the  edges  in  E  represent  dependencies  be¬ 
tween  instructions,  where  (w,  v)  e  E  means  that  instruction  u  must  execute  before 
instruction  v.  For  convenience,  however,  if  a  chain  of  instructions  contains  no 
parallel  control  (no  spawn,  sync,  or  return  from  a  spawn— via  either  an  explicit 
return  statement  or  the  return  that  happens  implicitly  upon  reaching  the  end  of 
a  procedure),  we  may  group  them  into  a  single  strand ,  each  of  which  represents 
one  or  more  instructions.  Instructions  involving  parallel  control  are  not  included 
in  strands,  but  are  represented  in  the  structure  of  the  dag.  For  example,  if  a  strand 
has  two  successors,  one  of  them  must  have  been  spawned,  and  a  strand  with  mul¬ 
tiple  predecessors  indicates  the  predecessors  joined  because  of  a  sync  statement. 
Thus,  in  the  general  case,  the  set  V  forms  the  set  of  strands,  and  the  set  E  of  di¬ 
rected  edges  represents  dependencies  between  strands  induced  by  parallel  control. 
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Figure  27.2  A  directed  acyclic  graph  representing  the  computation  of  P  Fib(4).  Each  circle  rep 
resents  one  strand,  with  black  circles  representing  either  base  cases  or  the  part  of  the  procedure 
(instance)  up  to  the  spawn  of  P  Fib(«  —  1)  in  line  3,  shaded  circles  representing  the  part  of  the  pro 
cedure  that  calls  P  Fib(h  —  2)  in  line  4  up  to  the  syne  in  line  5,  where  it  suspends  until  the  spawn  of 
P  FiB(n  —  1)  returns,  and  white  circles  representing  the  part  of  the  procedure  after  the  sync  where 
it  sums  x  and  y  up  to  the  point  where  it  returns  the  result.  Each  group  of  strands  belonging  to  the 
same  procedure  is  surrounded  by  a  rounded  rectangle,  lightly  shaded  for  spawned  procedures  and 
heavily  shaded  for  called  procedures.  Spawn  edges  and  call  edges  point  downward,  continuation 
edges  point  horizontally  to  the  right,  and  return  edges  point  upward.  Assuming  that  each  strand  takes 
unit  time,  the  work  equals  17  time  units,  since  there  are  17  strands,  and  the  span  is  8  time  units,  since 
the  critical  path  shown  with  shaded  edges  contains  8  strands. 


If  G  has  a  directed  path  from  strand  u  to  strand  v,  vve  say  that  the  two  strands  are 
(logically)  in  series.  Otherwise,  strands  u  and  v  are  (logically)  in  parallel. 

We  can  picture  a  multithreaded  computation  as  a  dag  of  strands  embedded  in  a 
tree  of  procedure  instances.  For  example.  Figure  27. 1  shows  the  tree  of  procedure 
instances  for  P-Fib(6)  without  the  detailed  structure  showing  strands.  Figure  27.2 
zooms  in  on  a  section  of  that  tree,  showing  the  strands  that  constitute  each  proce¬ 
dure.  All  directed  edges  connecting  strands  run  either  within  a  procedure  or  along 
undirected  edges  in  the  procedure  tree. 

We  can  classify  the  edges  of  a  computation  dag  to  indicate  the  kind  of  dependen¬ 
cies  between  the  various  strands.  A  continuation  edge  (u,u'),  drawn  horizontally 
in  Figure  27.2,  connects  a  strand  u  to  its  successor  u'  within  the  same  procedure 
instance.  When  a  strand  u  spawns  a  strand  v,  the  dag  contains  a  spawn  edge  ( u ,  v), 
which  points  downward  in  the  figure.  Call  edges,  representing  normal  procedure 
calls,  also  point  downward.  Strand  u  spawning  strand  v  differs  from  u  calling  v 
in  that  a  spawn  induces  a  horizontal  continuation  edge  from  u  to  the  strand  u'  fol- 
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lowing  u  in  its  procedure,  indicating  that  it'  is  free  to  execute  at  the  same  time 
as  v,  whereas  a  call  induces  no  such  edge.  When  a  strand  u  returns  to  its  calling 
procedure  and  x  is  the  strand  immediately  following  the  next  sync  in  the  calling 
procedure,  the  computation  dag  contains  return  edge  (u,x),  which  points  upward. 
A  computation  starts  with  a  single  initial  strand— the  black  vertex  in  the  procedure 
labeled  P-Fib(4)  in  Figure  27.2— and  ends  with  a  single  final  strand— the  white 
vertex  in  the  procedure  labeled  P-Fib(4). 

We  shall  study  the  execution  of  multithreaded  algorithms  on  an  ideal  paral¬ 
lel  computer,  which  consists  of  a  set  of  processors  and  a  sequentially  consistent 
shared  memory.  Sequential  consistency  means  that  the  shared  memory,  which  may 
in  reality  be  performing  many  loads  and  stores  from  the  processors  at  the  same 
time,  produces  the  same  results  as  if  at  each  step,  exactly  one  instruction  from  one 
of  the  processors  is  executed.  That  is,  the  memory  behaves  as  if  the  instructions 
were  executed  sequentially  according  to  some  global  1  i  near  order  that  preserves  the 
individual  orders  in  which  each  processor  issues  its  own  instructions.  For  dynamic 
multithreaded  computations,  which  are  scheduled  onto  processors  automatically 
by  the  concurrency  platform,  the  shared  memory  behaves  as  if  the  multithreaded 
computation’s  instructions  were  interleaved  to  produce  a  1  i near  order  that  preserves 
the  partial  order  of  the  computation  dag.  Depending  on  scheduling,  the  ordering 
could  differ  from  one  run  of  the  program  to  another,  but  the  behavior  of  any  exe¬ 
cution  can  be  understood  by  assuming  that  the  instructions  are  executed  in  some 
linear  order  consistent  with  the  computation  dag. 

In  addition  to  making  assumptions  about  semantics,  the  ideal-parallel-computer 
model  makes  some  performance  assumptions.  Specifically,  it  assumes  that  each 
processor  in  the  machine  has  equal  computing  power,  and  it  ignores  the  cost  of 
scheduling.  Although  this  last  assumption  may  sound  optimistic,  it  turns  out  that 
for  algorithms  with  sufficient  “parallelism”  (a  term  we  shall  define  precisely  in  a 
moment),  the  overhead  of  scheduling  is  generally  minimal  in  practice. 

Performance  measures 

We  can  gauge  the  theoretical  efficiency  of  a  multithreaded  algorithm  by  using  two 
metrics:  “work”  and  “span.”  The  work  of  a  multithreaded  computation  is  the  total 
time  to  execute  the  entire  computation  on  one  processor.  In  other  words,  the  work 
is  the  sum  of  the  times  taken  by  each  of  the  strands.  For  a  computation  dag  in 
which  each  strand  takes  unit  time,  the  work  is  just  the  number  of  vertices  in  the 
dag.  The  span  is  the  longest  time  to  execute  the  strands  along  any  path  in  the  dag. 
Again,  for  a  dag  in  which  each  strand  takes  unit  time,  the  span  equals  the  number  of 
vertices  on  a  longest  or  critical  path  in  the  dag.  (Recall  from  Section  24.2  that  we 
can  find  a  critical  path  in  a  dag  G  =  (V,  E )  in  Q(F  +  E)  time.)  For  example,  the 
computation  dag  of  Figure  27.2  has  17  vertices  in  all  and  8  vertices  on  its  critical 
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path,  so  that  if  each  strand  takes  unit  time,  its  work  is  17  time  units  and  its  span 
is  8  time  units. 

The  actual  running  time  of  a  multithreaded  computation  depends  not  only  on 
its  work  and  its  span,  but  also  on  how  many  processors  are  available  and  how 
the  scheduler  allocates  strands  to  processors.  To  denote  the  running  time  of  a 
multithreaded  computation  on  P  processors,  we  shall  subscript  by  P .  For  example, 
we  might  denote  the  running  time  of  an  algorithm  on  P  processors  by  TP.  The 
work  is  the  running  time  on  a  single  processor,  or  7\.  The  span  is  the  running  time 
if  we  could  run  each  strand  on  its  own  processor— in  other  words,  if  we  had  an 
unlimited  number  of  processors— and  so  we  denote  the  span  by  T0 c. 

The  work  and  span  provide  lower  bounds  on  the  running  time  TP  of  a  multi¬ 
threaded  computation  on  P  processors: 

•  In  one  step,  an  ideal  parallel  computer  with  P  processors  can  do  at  most  P 
units  of  work,  and  thus  in  TP  time,  it  can  perform  at  most  PTP  work.  Since  the 
total  work  to  do  is  7\,  we  have  PTP  >  7j .  Dividing  by  P  yields  the  work  law. 

TP  >  T\/P  .  (27.2) 

•  A  P  -processor  ideal  parallel  computer  cannot  run  any  faster  than  a  machine 
with  an  unlimited  number  of  processors.  Looked  at  another  way,  a  machine 
with  an  unlimited  number  of  processors  can  emulate  a  P  -processor  machine  by 
using  just  P  of  its  processors.  Thus,  the  span  law  follows: 


7>  >  7^  .  (27.3) 

We  define  the  speedup  of  a  computation  on  P  processors  by  the  ratio  Tt  /  TPl 
which  says  how  many  times  faster  the  computation  is  on  P  processors  than 
on  1  processor.  By  the  work  law,  we  have  TP  >  Ti/P,  which  implies  that 
Ti/TP  <  P.  Thus,  the  speedup  on  P  processors  can  be  at  most  P.  When  the 
speedup  is  linear  in  the  number  of  processors,  that  is,  when  7)  /TP  =  0(P),  the 
computation  exhibits  linear  speedup ,  and  when  7) /TP  =  P,  we  have  perfect 
linear  speedup. 

The  ratio  7)  /  Tx  of  the  work  to  the  span  gives  the  parallelism  of  the  multi¬ 
threaded  computation.  We  can  view  the  parallelism  from  three  perspectives.  As  a 
ratio,  the  parallelism  denotes  the  average  amount  of  work  that  can  be  performed  in 
parallel  for  each  step  along  the  critical  path.  As  an  upper  bound,  the  parallelism 
gives  the  maximum  possible  speedup  that  can  be  achieved  on  any  number  of  pro¬ 
cessors.  Finally,  and  perhaps  most  important,  the  parallelism  provides  a  limit  on 
the  possibility  of  attaining  perfect  linear  speedup.  Specifically,  once  the  number  of 
processors  exceeds  the  parallelism,  the  computation  cannot  possibly  achieve  per¬ 
fect  linear  speedup.  To  see  this  last  point,  suppose  that  P  >  T ,  /  7^ ,  in  which  case 
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the  span  law  implies  that  the  speedup  satisfies  7j/7V  <  T \  /  T^_  <  P.  Moreover, 
if  the  number  P  of  processors  in  the  ideal  parallel  computer  greatly  exceeds  the 
parallelism— that  is,  if  P  7j  /  —  then  TJT ’P  <8C  P,  so  that  the  speedup  is 

much  less  than  the  number  of  processors.  In  other  words,  the  more  processors  we 
use  beyond  the  parallelism,  the  less  perfect  the  speedup. 

As  an  example,  consider  the  computation  P-Fib(4)  in  Figure  27.2,  and  assume 
that  each  strand  takes  unit  time.  Since  the  work  is  7j  =  17  and  the  span  is  =  8, 
the  parallelism  is  T \  /  7^  =  17/8  =  2.125.  Consequently,  achieving  much  more 
than  double  the  speedup  is  impossible,  no  matter  how  many  processors  we  em¬ 
ploy  to  execute  the  computation.  For  larger  input  sizes,  however,  we  shall  see  that 
P-Fib(h)  exhibits  substantial  parallelism. 

We  define  the  (parallel)  slackness  of  a  multithreaded  computation  executed 
on  an  ideal  parallel  computer  with  P  processors  to  be  the  ratio  (F1/T'00)/P  = 
7j/ (PToo),  which  is  the  factor  by  which  the  parallelism  of  the  computation  ex¬ 
ceeds  the  number  of  processors  in  the  machine.  Thus,  if  the  slackness  is  less  than  1, 
we  cannot  hope  to  achieve  perfect  linear  speedup,  because  T\/(P  7'00 )  <  1  and  the 
span  law  imply  that  the  speedup  on  P  processors  satisfies  T\/TP  <  T\/T0 0  <  P . 
Indeed,  as  the  slackness  decreases  from  1  toward  0,  the  speedup  of  the  computation 
diverges  further  and  further  from  perfect  linear  speedup.  If  the  slackness  is  greater 
than  1,  however,  the  work  per  processor  is  the  limiting  constraint.  As  we  shall  see, 
as  the  slackness  increases  from  1,  a  good  scheduler  can  achieve  closer  and  closer 
to  perfect  linear  speedup. 

Scheduling 

Good  performance  depends  on  more  than  just  minimizing  the  work  and  span.  The 
strands  must  also  be  scheduled  efficiently  onto  the  processors  of  the  parallel  ma¬ 
chine.  Our  multithreaded  programming  model  provides  no  way  to  specify  which 
strands  to  execute  on  which  processors.  Instead,  we  rely  on  the  concurrency  plat¬ 
form’s  scheduler  to  map  the  dynamically  unfolding  computation  to  individual  pro¬ 
cessors.  In  practice,  the  scheduler  maps  the  strands  to  static  threads,  and  the  op¬ 
erating  system  schedules  the  threads  on  the  processors  themselves,  but  this  extra 
level  of  indirection  is  unnecessary  for  our  understanding  of  scheduling.  We  can 
just  imagine  that  the  concurrency  platform’s  scheduler  maps  strands  to  processors 
directly. 

A  multithreaded  scheduler  must  schedule  the  computation  with  no  advance 
knowledge  of  when  strands  will  be  spawned  or  when  they  will  complete— it  must 
operate  on-line.  Moreover,  a  good  scheduler  operates  in  a  distributed  fashion, 
where  the  threads  implementing  the  scheduler  cooperate  to  load-balance  the  com¬ 
putation.  Provably  good  on-line,  distributed  schedulers  exist,  but  analyzing  them 
is  complicated. 


782 


Chapter  27  Multithreaded  Algorithms 


Instead,  to  keep  our  analysis  simple,  we  shall  investigate  an  on-line  centralized 
scheduler,  which  knows  the  global  state  of  the  computation  at  any  given  time.  In 
particular",  we  shall  analyze  greedy  schedulers ,  which  assign  as  many  strands  to 
processors  as  possible  in  each  time  step.  If  at  least  P  strands  are  ready  to  execute 
during  a  time  step,  we  say  that  the  step  is  a  complete  step ,  and  a  greedy  scheduler 
assigns  any  P  of  the  ready  strands  to  processors.  Otherwise,  fewer  than  P  strands 
are  ready  to  execute,  in  which  case  we  say  that  the  step  is  an  incomplete  step,  and 
the  scheduler  assigns  each  ready  strand  to  its  own  processor. 

From  the  work  law,  the  best  running  time  we  can  hope  for  on  P  processors 
is  Tp  =  T\/P,  and  from  the  span  law  the  best  we  can  hope  for  is  Tp  =  T0 c. 
The  following  theorem  shows  that  greedy  scheduling  is  provably  good  in  that  it 
achieves  the  sum  of  these  two  lower  bounds  as  an  upper  bound. 

Theorem  27.1 

On  an  ideal  parallel  computer  with  P  processors,  a  greedy  scheduler  executes  a 
multithreaded  computation  with  work  7\  and  span  T„ 0  in  time 

Tp  <  Tr/P  +  Tx  .  (27.4) 

Proof  We  start  by  considering  the  complete  steps.  In  each  complete  step,  the 
P  processors  together  perform  a  total  of  P  work.  Suppose  for  the  purpose  of 
contradiction  that  the  number  of  complete  steps  is  strictly  greater  than  \Ti/P\. 
Then,  the  total  work  of  the  complete  steps  is  at  least 

P-(\Tx/P\  +  \)  =  P  |7\/PJ  +  P 

=  T\  —  (7)  mod  P)  +  P  (by  equation  (3.8)) 

>  T\  (by  inequality  (3.9))  . 

Thus,  we  obtain  the  contradiction  that  the  P  processors  would  perform  more  work 
than  the  computation  requires,  which  allows  us  to  conclude  that  the  number  of 
complete  steps  is  at  most  \  T\/ P \. 

Now,  consider  an  incomplete  step.  Let  G  be  the  dag  representing  the  entire 
computation,  and  without  loss  of  generality,  assume  that  each  strand  takes  unit 
time.  (We  can  replace  each  longer  strand  by  a  chain  of  unit-time  strands.)  Let  G' 
be  the  subgraph  of  G  that  has  yet  to  be  executed  at  the  start  of  the  incomplete  step, 
and  let  G"  be  the  subgraph  remaining  to  be  executed  after  the  incomplete  step.  A 
longest  path  in  a  dag  must  necessarily  start  at  a  vertex  with  in-degree  0.  Since  an 
incomplete  step  of  a  greedy  scheduler  executes  all  strands  with  in-degree  0  in  G', 
the  length  of  a  longest  path  in  G"  must  be  1  less  than  the  length  of  a  longest  path 
in  G'.  In  other  words,  an  incomplete  step  decreases  the  span  of  the  unexecuted  dag 
by  1.  Hence,  the  number  of  incomplete  steps  is  at  most  T, 

Since  each  step  is  either  complete  or  incomplete,  the  theorem  follows.  ■ 
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The  following  corollary  to  Theorem  27.1  shows  that  a  greedy  scheduler  always 
performs  well. 

Corollary  27.2 

The  running  time  7V  of  any  multithreaded  computation  scheduled  by  a  greedy 
scheduler  on  an  ideal  parallel  computer  with  P  processors  is  within  a  factor  of  2 
of  optimal. 

Proof  Let  Tp  be  the  running  time  produced  by  an  optimal  scheduler  on  a  machine 
with  P  processors,  and  let  7\  and  Tx  be  the  work  and  span  of  the  computation, 
respectively.  Since  the  work  and  span  laws— inequalities  (27.2)  and  (27.3)— give 
us  Tp  >  max(7)  / P.  T^),  Theorem  27.1  implies  that 

TP  <  TJP  +  T^ 

<  2-max(Ti/P,  7^) 

<  2  T;  .  m 

The  next  corollary  shows  that,  in  fact,  a  greedy  scheduler  achieves  near-perfect 
linear  speedup  on  any  multithreaded  computation  as  the  slackness  grows. 

Corollary  27.3 

Let  TV  be  the  running  time  of  a  multithreaded  computation  produced  by  a  greedy 
scheduler  on  an  ideal  parallel  computer  with  P  processors,  and  let  7)  and  Tx  be 
the  work  and  span  of  the  computation,  respectively.  Then,  if  P  <£.  T { /  7^ ,  we 
have  Tp  /  ]  /  P .  or  equivalently,  a  speedup  of  approximately  P . 

Proof  If  we  suppose  that  P  T |  /  7^ ,  then  we  also  have  7'oc  <8C  T\/  P ,  and 

hence  Theorem  27.1  gives  us  7P  <  Tx/P  +  Tx  %  TJ  P .  Since  the  work 
law  (27.2)  dictates  that  TP  >  Ti/P,  we  conclude  that  TP  Tx/ P,  or  equiva¬ 
lently,  that  the  speedup  is  7)/ TP  %  P .  m 

The  <5C  symbol  denotes  “much  less,”  but  how  much  is  “much  less”?  As  a  rule 
of  thumb,  a  slackness  of  at  least  10— that  is,  10  times  more  parallelism  than  pro¬ 
cessors— generally  suffices  to  achieve  good  speedup.  Then,  the  span  term  in  the 
greedy  bound,  inequality  (27.4),  is  less  than  10%  of  the  work-per-processor  term, 
which  is  good  enough  for  most  engineering  situations.  For  example,  if  a  computa¬ 
tion  runs  on  only  10  or  100  processors,  it  doesn’t  make  sense  to  value  parallelism 
of,  say  1,000,000  over  parallelism  of  10,000,  even  with  the  factor  of  100  differ¬ 
ence.  As  Problem  27-2  shows,  sometimes  by  reducing  extreme  parallelism,  we 
can  obtain  algorithms  that  are  better  with  respect  to  other  concerns  and  which  still 
scale  up  well  on  reasonable  numbers  of  processors. 
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Work:  7j(/l  U  B)  =  Ti(A)  +  T^B) 
Span:  TX(A  U  B)  =  TX(A)  +  7^(6) 


Work:  T{(A  U  B)  =  Ty(A)  +  T^B) 

Span:  TX(A  U  B)  =  max(7’00(/l)>  T^B)) 


(a) 


(b) 


Figure  27.3  The  work  and  span  of  composed  subcomputations,  (a)  When  two  subcomputations 
are  joined  in  series,  the  work  of  the  composition  is  the  sum  of  their  work,  and  the  span  of  the 
composition  is  the  sum  of  their  spans,  (b)  When  two  subcomputations  are  joined  in  parallel,  the 
work  of  the  composition  remains  the  sum  of  their  work,  but  the  span  of  the  composition  is  only  the 
maximum  of  their  spans. 

Analyzing  multithreaded  algorithms 

We  now  have  all  the  tools  we  need  to  analyze  multithreaded  algorithms  and  provide 
good  bounds  on  their  running  times  on  various  numbers  of  processors.  Analyzing 
the  work  is  relatively  straightforward,  since  it  amounts  to  nothing  more  than  ana¬ 
lyzing  the  running  time  of  an  ordinary  serial  algorithm— namely,  the  serialization 
of  the  multithreaded  algorithm— which  you  should  already  be  familiar  with,  since 
that  is  what  most  of  this  textbook  is  about!  Analyzing  the  span  is  more  interesting, 
but  generally  no  harder  once  you  get  the  hang  of  it.  We  shall  investigate  the  basic 
ideas  using  the  P-FlB  program. 

Analyzing  the  work  7j(/i)  of  P-FlB(n)  poses  no  hurdles,  because  we’ve  already 
done  it.  The  original  Fib  procedure  is  essentially  the  serialization  of  P-Fib,  and 
hence  Tx(n)  =  T(n)  =  0(0")  from  equation  (27.1). 

Figure  27.3  illustrates  how  to  analyze  the  span.  If  two  subcomputations  are 
joined  in  series,  their  spans  add  to  form  the  span  of  their  composition,  whereas 
if  they  are  joined  in  parallel,  the  span  of  their  composition  is  the  maximum  of  the 
spans  of  the  two  subcomputations.  For  P-Fib  («),  the  spawned  call  to  P-Fib  (n  —  1 ) 
in  line  3  runs  in  parallel  with  the  call  to  P-Fib  (n  —  2)  in  line  4.  Hence,  we  can 
express  the  span  of  P-Fib(/j)  as  the  recurrence 

TM  =  max(7’0O(«- l),roo(/i-2))  +  0(l) 

=  T^n  -  1)  +  0(1)  , 

which  has  solution  7’00(/i)  =  0(n). 

The  parallelism  of  P-Fib(h)  is  Ti(n)/Too(n )  =  0(</>n/n),  which  grows  dra¬ 
matically  as  n  gets  large.  Thus,  on  even  the  largest  parallel  computers,  a  modest 
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value  for  n  suffices  to  achieve  near  perfect  linear  speedup  for  P-Fib(/?),  because 
this  procedure  exhibits  considerable  parallel  slackness. 

Parallel  loops 

Many  algorithms  contain  loops  all  of  whose  iterations  can  operate  in  parallel.  As 
we  shall  see,  we  can  parallelize  such  loops  using  the  spawn  and  sync  keywords, 
but  it  is  much  more  convenient  to  specify  directly  that  the  iterations  of  such  loops 
can  run  concurrently.  Our  pseudocode  provides  this  functionality  via  the  parallel 
concurrency  keyword,  which  precedes  the  for  keyword  in  a  for  loop  statement. 

As  an  example,  consider  the  problem  of  multiplying  an  n  x  n  matrix  A  =  (a!y  ) 
by  an  n -vector  x  =  (xj).  The  resulting  n -vector  y  —  ( v / )  is  given  by  the  equation 


n 


7  =  1 


for  i  =  1,2 We  can  perform  matrix- vector  multiplication  by  computing  all 
the  entries  of  y  in  parallel  as  follows: 

Mat-Vec(A.x) 

1  n  —  A. rows 

2  let  y  be  a  new  vector  of  length  n 

3  parallel  for  i  =  1  to  n 


4  yi  =  0 

5  parallel  for  i  =  1  to  n 

6  for  j  =  1  to  n 

7  yt  =  y{  +  ctijXj 


8  return  y 

In  this  code,  the  parallel  for  keywords  in  lines  3  and  5  indicate  that  the  itera¬ 
tions  of  the  respective  loops  may  be  run  concurrently.  A  compiler  can  implement 
each  parallel  for  loop  as  a  divide-and-conquer  subroutine  using  nested  parallelism. 
For  example,  the  parallel  for  loop  in  lines  5-7  can  be  implemented  with  the  call 
Mat-Vec-Main-Loop(A,  x,  y,  n.  1  ,n),  where  the  compiler  produces  the  auxil¬ 
iary  subroutine  Mat-Vec-Main-Loop  as  follows: 
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Figure  27.4  Adag  representing  the  computation  of  Mat  Vec  Main  LoopM,  y ,  8, 1,8).  The 
two  numbers  within  each  rounded  rectangle  give  the  values  of  the  last  two  parameters  (i  and  i'  in 
the  procedure  header)  in  the  invocation  (spawn  or  call)  of  the  procedure.  The  black  circles  repre 
sent  strands  corresponding  to  either  the  base  case  or  the  part  of  the  procedure  up  to  the  spawn  of 
Mat  VEC  Main  Loop  in  line  5;  the  shaded  circles  represent  strands  corresponding  to  the  part  of 
the  procedure  that  calls  MAT  VEC  Main  LOOP  in  line  6  up  to  the  sync  in  line  7,  where  it  suspends 
until  the  spawned  subroutine  in  line  5  returns;  and  the  white  circles  represent  strands  corresponding 
to  the  (negligible)  part  of  the  procedure  after  the  sync  up  to  the  point  where  it  returns. 


Mat-Vec-Main-LoopC4, X,  y,n,i,  i') 

1  if  i  ==  i' 

2  for  j  =  1  to  n 

3  j,-  =  yt  +  atjXj 

4  else  mid  =  [(i  +  / ' ) / 2 j 

5  spawn  MAT-VEC-MAIN-L00P(y4.x,  y,n,i,mid) 

6  Mat-Vec-Main-LoopM,*,  y,n,mid  +  1,/') 

7  sync 

This  code  recursively  spawns  the  first  half  of  the  iterations  of  the  loop  to  execute 
in  parallel  with  the  second  half  of  the  iterations  and  then  executes  a  sync,  thereby 
creating  a  binary  tree  of  execution  where  the  leaves  are  individual  loop  iterations, 
as  shown  in  Figure  27.4. 

To  calculate  the  work  T\  (n)  of  Mat-  Vec  on  an  n  xn  matrix,  we  simply  compute 
the  running  time  of  its  serialization,  which  we  obtain  by  replacing  the  parallel  for 
loops  with  ordinary  for  loops.  Thus,  we  have  Tt(n)  =  0(n2),  because  the  qua¬ 
dratic  running  time  of  the  doubly  nested  loops  in  lines  5-7  dominates.  This  analysis 
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seems  to  ignore  the  overhead  for  recursive  spawning  in  implementing  the  parallel 
loops,  however.  In  fact,  the  overhead  of  recursive  spawning  does  increase  the  work 
of  a  parallel  loop  compared  with  that  of  its  serialization,  but  not  asymptotically. 
To  see  why,  observe  that  since  the  tree  of  recursive  procedure  instances  is  a  full 
binary  tree,  the  number  of  internal  nodes  is  1  fewer  than  the  number  of  leaves  (see 
Exercise  B.5-3).  Each  internal  node  performs  constant  work  to  divide  the  iteration 
range,  and  each  leaf  corresponds  to  an  iteration  of  the  loop,  which  takes  at  least 
constant  time  (0(«)  time  in  this  case).  Thus,  we  can  amortize  the  overhead  of  re¬ 
cursive  spawning  against  the  work  of  the  iterations,  contributing  at  most  a  constant 
factor  to  the  overall  work. 

As  a  practical  matter,  dynamic-multithreading  concurrency  platforms  sometimes 
coarsen  the  leaves  of  the  recursion  by  executing  several  iterations  in  a  single  leaf, 
either  automatically  or  under  programmer  control,  thereby  reducing  the  overhead 
of  recursive  spawning.  This  reduced  overhead  comes  at  the  expense  of  also  reduc¬ 
ing  the  parallelism,  however,  but  if  the  computation  has  sufficient  parallel  slack¬ 
ness,  near-perfect  linear-  speedup  need  not  be  sacrificed. 

We  must  also  account  for  the  overhead  of  recursive  spawning  when  analyzing  the 
span  of  a  parallel-loop  construct.  Since  the  depth  of  recursive  calling  is  logarithmic 
in  the  number  of  iterations,  for  a  parallel  loop  with  n  iterations  in  which  the  ith 
iteration  has  span  iter0 c(z),  the  span  is 

Too(n)  =  0(lg«)  +  max  iter^i)  . 

1  <i<n 

For  example,  for  Mat-Vec  on  an  n  x  n  matrix,  the  parallel  initialization  loop  in 
lines  3-4  has  span  0(lg  n),  because  the  recursive  spawning  dominates  the  constant¬ 
time  work  of  each  iteration.  The  span  of  the  doubly  nested  loops  in  lines  5-7 
is  0(»),  because  each  iteration  of  the  outer  parallel  for  loop  contains  n  iterations 
of  the  inner  (serial)  for  loop.  The  span  of  the  remaining  code  in  the  procedure 
is  constant,  and  thus  the  span  is  dominated  by  the  doubly  nested  loops,  yielding 
an  overall  span  of  0(n)  for  the  whole  procedure.  Since  the  work  is  0(»2),  the 
parallelism  is  0(«2)/0(«)  =  0(n).  (Exercise  27.1-6  asks  you  to  provide  an 
implementation  with  even  more  parallelism.) 

Race  conditions 

A  multithreaded  algorithm  is  deterministic  if  it  always  does  the  same  thing  on  the 
same  input,  no  matter  how  the  instructions  are  scheduled  on  the  multicore  com¬ 
puter.  It  is  nondeterministic  if  its  behavior  might  vary  from  run  to  run.  Often,  a 
multithreaded  algorithm  that  is  intended  to  be  deterministic  fails  to  be,  because  it 
contains  a  “detemrinacy  race.” 

Race  conditions  are  the  bane  of  concurrency.  Famous  race  bugs  include  the 
Therac-25  radiation  therapy  machine,  which  killed  three  people  and  injured  sev- 
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eral  others,  and  the  North  American  Blackout  of  2003,  which  left  over  50  million 
people  without  power.  These  pernicious  bugs  are  notoriously  hard  to  find.  You  can 
run  tests  in  the  lab  for  days  without  a  failure  only  to  discover  that  your  software 
sporadically  crashes  in  the  field. 

A  determinacy  race  occurs  when  two  logically  parallel  instructions  access  the 
same  memory  location  and  at  least  one  of  the  instructions  performs  a  write.  The 
following  procedure  illustrates  a  race  condition: 

Race-Example  ( ) 

1  x  =  0 

2  parallel  for  i  =  1  to  2 

3  x  =  x  +  1 

4  print  x 

After  initializing  x  to  0  in  line  1,  Race-Example  creates  two  parallel  strands, 
each  of  which  increments  x  in  line  3.  Although  it  might  seem  that  Race- 
Example  should  always  print  the  value  2  (its  serialization  certainly  does),  it  could 
instead  print  the  value  1 .  Let’s  see  how  this  anomaly  might  occur. 

When  a  processor  increments  x,  the  operation  is  not  indivisible,  but  is  composed 
of  a  sequence  of  instructions: 

1.  Read  x  from  memory  into  one  of  the  processor’s  registers. 

2.  Increment  the  value  in  the  register. 

3.  Write  the  value  in  the  register  back  into  x  in  memory. 

Figure  27.5(a)  illustrates  a  computation  dag  representing  the  execution  of  Race- 
Example,  with  the  strands  broken  down  to  individual  instructions.  Recall  that 
since  an  ideal  parallel  computer  supports  sequential  consistency,  we  can  view  the 
parallel  execution  of  a  multithreaded  algorithm  as  an  interleaving  of  instructions 
that  respects  the  dependencies  in  the  dag.  Paid  (b)  of  the  figure  shows  the  values 
in  an  execution  of  the  computation  that  elicits  the  anomaly.  The  value  x  is  stored 
in  memory,  and  r,  and  r2  are  processor  registers.  In  step  1 ,  one  of  the  processors 
sets  x  to  0.  In  steps  2  and  3,  processor  1  reads  x  from  memory  into  its  register  rx 
and  increments  it,  producing  the  value  1  in  rx.  At  that  point,  processor  2  comes 
into  the  picture,  executing  instructions  4-6.  Processor  2  reads  x  from  memory  into 
register  r2;  increments  it,  producing  the  value  1  in  r2;  and  then  stores  this  value 
into  x,  setting  x  to  1.  Now,  processor  1  resumes  with  step  7,  storing  the  value  1 
in  rt  into  x,  which  leaves  the  value  of  x  unchanged.  Therefore,  step  8  prints  the 
value  1 ,  rather  than  2,  as  the  serialization  would  print. 

We  can  see  what  has  happened.  If  the  effect  of  the  parallel  execution  were  that 
processor  1  executed  all  its  instructions  before  processor  2,  the  value  2  would  be 
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Figure  275  Illustration  of  the  determinacy  race  in  Race  Example,  (a)  A  computation  dag  show 
ing  the  dependencies  among  individual  instructions.  The  processor  registers  are  r i  and  r2.  Instruc 
tions  unrelated  to  the  race,  such  as  the  implementation  of  loop  control,  are  omitted,  (b)  An  execution 
sequence  that  elicits  the  bug,  showing  the  values  of  x  in  memory  and  registers  r\  and  r2  for  each 
step  in  the  execution  sequence. 

printed.  Conversely,  if  the  effect  were  that  processor  2  executed  all  its  instructions 
before  processor  1 ,  the  value  2  would  still  be  printed.  When  the  instructions  of  the 
two  processors  execute  at  the  same  time,  however,  it  is  possible,  as  in  this  example 
execution,  that  one  of  the  updates  to  x  is  lost. 

Of  course,  many  executions  do  not  elicit  the  bug.  For  example,  if  the  execution 
order  were  (1,  2,  3,  7,  4,  5,  6,  8)  or  (1,  4,  5,  6,  2,  3,  7,  8),  we  would  get  the  cor¬ 
rect  result.  That’s  the  problem  with  determinacy  races.  Generally,  most  orderings 
produce  correct  results— such  as  any  in  which  the  instructions  on  the  left  execute 
before  the  instructions  on  the  right,  or  vice  versa.  But  some  orderings  generate 
improper  results  when  the  instructions  interleave.  Consequently,  races  can  be  ex¬ 
tremely  hard  to  test  for.  You  can  run  tests  for  days  and  never  see  the  bug,  only  to 
experience  a  catastrophic  system  crash  in  the  field  when  the  outcome  is  critical. 

Although  we  can  cope  with  races  in  a  variety  of  ways,  including  using  mutual- 
exclusion  locks  and  other  methods  of  synchronization,  for  our  purposes,  we  shall 
simply  ensure  that  strands  that  operate  in  parallel  are  independent  they  have  no 
determinacy  races  among  them.  Thus,  in  a  parallel  for  construct,  all  the  iterations 
should  be  independent.  Between  a  spawn  and  the  corresponding  sync,  the  code 
of  the  spawned  child  should  be  independent  of  the  code  of  the  parent,  including 
code  executed  by  additional  spawned  or  called  children.  Note  that  arguments  to  a 
spawned  child  are  evaluated  in  the  parent  before  the  actual  spawn  occurs,  and  thus 
the  evaluation  of  arguments  to  a  spawned  subroutine  is  in  series  with  any  accesses 
to  those  arguments  after  the  spawn. 
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As  an  example  of  how  easy  it  is  to  generate  code  with  races,  here  is  a  faulty 
implementation  of  multithreaded  matrix-vector  multiplication  that  achieves  a  span 
of  0(lg/t)  by  parallelizing  the  inner  for  loop: 

Mat-Vec-Wrong(A,x) 

1  n  =  A.  rows 

2  let  y  be  a  new  vector  of  length  n 

3  parallel  for  i  =  I  to  n 

4  yt  =  0 

5  parallel  for  i  =  I  to  n 

6  parallel  for  j  =  1  to  n 

7  yi  =  yt  +  ciijXj 

8  return  y 

This  procedure  is,  unfortunately,  incorrect  due  to  races  on  updating  yt  in  line  7, 
which  executes  concurrently  for  all  n  values  of  j .  Exercise  27. 1-6  asks  you  to  give 
a  correct  implementation  with  Q(lg  n)  span. 

A  multithreaded  algorithm  with  races  can  sometimes  be  correct.  As  an  exam¬ 
ple,  two  parallel  threads  might  store  the  same  value  into  a  shared  variable,  and  it 
wouldn’t  matter  which  stored  the  value  first.  Generally,  however,  we  shall  consider 
code  with  races  to  be  illegal. 

A  chess  lesson 

We  close  this  section  with  a  true  story  that  occurred  during  the  development  of 
the  world-class  multithreaded  chess-playing  program  *Socrates  [80],  although  the 
timings  below  have  been  simplified  for  exposition.  The  program  was  prototyped 
on  a  32-processor  computer  but  was  ultimately  to  run  on  a  supercomputer  with  512 
processors.  At  one  point,  the  developers  incorporated  an  optimization  into  the  pro¬ 
gram  that  reduced  its  running  time  on  an  important  benchmark  on  the  32-processor 
machine  from  T32  =  65  seconds  to  T22  =  40  seconds.  Yet,  the  developers  used 
the  work  and  span  performance  measures  to  conclude  that  the  optimized  version, 
which  was  faster  on  32  processors,  would  actually  be  slower  than  the  original  ver¬ 
sion  on  512  processsors.  As  a  result,  they  abandoned  the  “optimization.” 

Here  is  their  analysis.  The  original  version  of  the  program  had  work  7\  =  2048 
seconds  and  span  =  1  second.  If  we  treat  inequality  (27.4)  as  an  equation, 
TP  =  Ti/P  +  7)^,  and  use  it  as  an  approximation  to  the  running  time  on  P  pro¬ 
cessors,  we  see  that  indeed  T32  =  2048/32  +  1  =  65.  With  the  optimization,  the 
work  became  T[  =  1024  seconds  and  the  span  became  7//  =  8  seconds.  Again 
using  our  approximation,  we  get  T22  =  1024/32  +  8  =  40. 

The  relative  speeds  of  the  two  versions  switch  when  we  calculate  the  running 
times  on  512  processors,  however.  In  particular-,  we  have  T5l2  =  2048/512+1  =  5 
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seconds,  and  T'sxz  =  1024/512  +  8  =  10  seconds.  The  optimization  that  sped  up 
the  program  on  32  processors  would  have  made  the  program  twice  as  slow  on  512 
processors!  The  optimized  version’s  span  of  8,  which  was  not  the  dominant  term  in 
the  running  time  on  32  processors,  became  the  dominant  term  on  512  processors, 
nullifying  the  advantage  from  using  more  processors. 

The  moral  of  the  story  is  that  work  and  span  can  provide  a  better  means  of 
extrapolating  performance  than  can  measured  running  times. 

Exercises 


27.1-1 

Suppose  that  we  spawn  P-FlB  (»  —  2)  in  line  4  of  P-Fib,  rather  than  calling  it 
as  is  done  in  the  code.  What  is  the  impact  on  the  asymptotic  work,  span,  and 
parallelism? 


27.1-2 

Draw  the  computation  dag  that  results  from  executing  P-Fib  (5).  Assuming  that 
each  strand  in  the  computation  takes  unit  time,  what  are  the  work,  span,  and  par¬ 
allelism  of  the  computation?  Show  how  to  schedule  the  dag  on  3  processors  using 
greedy  scheduling  by  labeling  each  strand  with  the  time  step  in  which  it  is  executed. 


27.1-3 

Prove  that  a  greedy  scheduler  achieves  the  following  time  bound,  which  is  slightly 
stronger  than  the  bound  proven  in  Theorem  27.1: 


TP  < 


+  Too  • 


(27.5) 


27.1-4 

Construct  a  computation  dag  for  which  one  execution  of  a  greedy  scheduler  can 
take  nearly  twice  the  time  of  another  execution  of  a  greedy  scheduler  on  the  same 
number  of  processors.  Describe  how  the  two  executions  would  proceed. 


27.1-5 

Professor  Karan  measures  her  deterministic  multithreaded  algorithm  on  4,  10, 
and  64  processors  of  an  ideal  parallel  computer  using  a  greedy  scheduler.  She 
claims  that  the  three  runs  yielded  T4  =  80  seconds,  TU)  =  42  seconds,  and 
T64  =10  seconds.  Argue  that  the  professor  is  either  lying  or  incompetent.  (Hint: 
Use  the  work  law  (27.2),  the  span  law  (27.3),  and  inequality  (27.5)  from  Exer¬ 
cise  27.1-3.) 
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27.1-6 

Give  a  multithreaded  algorithm  to  multiply  an  n  x  n  matrix  by  an  //-vector  that 
achieves  0(»2/  lg  n)  parallelism  while  maintaining  0(n2)  work. 


27.1-7 

Consider  the  following  multithreaded  pseudocode  for  transposing  an  n  xn  matrix  A 
in  place: 

P-Transpose(A) 

1  n  =  A.  rows 

2  parallel  for  j  =  2  ton 

3  parallel  for  i  =  1  to  j  —  1 

4  exchange  dij  with  ap 

Analyze  the  work,  span,  and  parallelism  of  this  algorithm. 


27.1-8 

Suppose  that  we  replace  the  parallel  for  loop  in  line  3  of  P-TRANSPOSE  (see  Ex¬ 
ercise  27 . 1-7)  with  an  ordinary  for  loop.  Analyze  the  work,  span,  and  parallelism 
of  the  resulting  algorithm. 


27.1-9 

For  how  many  processors  do  the  two  versions  of  the  chess  programs  run  equally 
fast,  assuming  that  TP  =  T1/P  +  7'00? 


27.2  Multithreaded  matrix  multiplication 

In  this  section,  we  examine  how  to  multithread  matrix  multiplication,  a  problem 
whose  serial  running  time  we  studied  in  Section  4.2.  We’ll  look  at  multithreaded 
algorithms  based  on  the  standard  triply  nested  loop,  as  well  as  divide-and-conquer 
algorithms. 

Multithreaded  matrix  multiplication 

The  first  algorithm  we  study  is  the  straighforward  algorithm  based  on  parallelizing 
the  loops  in  the  procedure  Square-Matrix-Multiply  on  page  75: 
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P-Square-Matrix-Multiply(A,  B) 

1  n  =  A.  rows 

2  let  C  be  a  new  n  x  n  matrix 

3  parallel  for  i  =  1  to  n 

4  parallel  for  /  =  1  to  n 

5  Cjj  =  0 

6  for  k  =  1  to  n 

2  Cjj  —  Cjj  +  •  hfcj 

8  return  C 

To  analyze  this  algorithm,  observe  that  since  the  serialization  of  the  algorithm  is 
just  Square-Matrix-Multiply,  the  work  is  therefore  simply  7j(«)  =  0(«3), 
the  same  as  the  running  time  of  Square-Matrix-Multiply.  The  span  is 
Tx(n)  =  0(«),  because  it  follows  a  path  down  the  tree  of  recursion  for  the 
parallel  for  loop  starting  in  line  3 ,  then  down  the  tree  of  recursion  for  the  parallel 
for  loop  starting  in  line  4,  and  then  executes  all  n  iterations  of  the  ordinary  for  loop 
starting  in  line  6,  resulting  in  a  total  span  of  0(lg«)  +  0(lg»)  +  0(«)  =  0( n). 
Thus,  the  parallelism  is  0(n3)/ ©(«)  =  0(n2).  Exercise  27.2-3  asks  you  to  par¬ 
allelize  the  inner  loop  to  obtain  a  parallelism  of  0(«3/  lg  n),  which  you  cannot  do 
straightforwardly  using  parallel  for,  because  you  would  create  races. 

A  divide-and-conquer  multithreaded  algorithm  for  matrix  multiplication 

As  we  learned  in  Section  4.2,  we  can  multiply  n  x  n  matrices  serially  in  time 
0(«lg7)  =  0(n2,81)  using  Strassen’s  divide-and-conquer  strategy,  which  motivates 
us  to  look  at  multithreading  such  an  algorithm.  We  begin,  as  we  did  in  Section  4.2, 
with  multithreading  a  simpler  divide-and-conquer  algorithm. 

Recall  from  page  77  that  the  Square-Matrix-Multiply-Recursive  proce¬ 
dure,  which  multiplies  two  n  x  n  matrices  A  and  B  to  produce  the  n  x  n  matrix  C , 
relies  on  partitioning  each  of  the  three  matrices  into  four  «/2xn/2  submatrices: 


Then,  we  can  write  the  matrix  product  as 


( Cn  C12\  (Au  A12\(  Bn  B12\ 

V  C2i  C22  )  \  A2 1  A2 2  J  y  B2 1  B22  j 

(  A\\Bn  AnB\2\  (  A\2B2 1  A\2B22\ 

V  A21Bn  A21B12  )  +  {  A22B21  A22B22  )  ' 

Thus,  to  multiply  two  nxn  matrices,  we  perform  eight  multiplications  of  «/2x«/2 
matrices  and  one  addition  of  nxn  matrices.  The  following  pseudocode  implements 
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this  divide-and-conquer  strategy  using  nested  parallelism.  Unlike  the  SQUARE- 
Matrix-Multiply-Recursive  procedure  on  which  it  is  based,  P-Matrix- 
Multiply-Recursive  takes  the  output  matrix  as  a  parameter  to  avoid  allocating 
matrices  unnecessarily. 

P-Matrix-Multiply-Recursive(C,  A,  B) 

1  n  =  A.  rows 

2  if  n  ==  1 

3  Ci  i  =  anbn 

4  else  let  T  be  a  new  n  x  n  matrix 

5  partition  A,  B,  C,  and  T  into  «/2xn/2  submatrices 

^4 i i 7  Ai2,  A2 1,  A22;  B ii,  B 12,  B2 i,  522;  Cn,  C12,  Cn,  C 22, 

and  Tu,T]2,  T21,T22;  respectively 

6  spawn  P-Matrix-Multiply-Recursive(Ch,  4n,  5n) 

7  spawn  P-Matrix-Multiply-Recursive(Ci2,  An,  B12) 

8  spawn  P-Matrix-Multiply-Recursive(C2i,  A21,  Bn) 

9  spawn  P-Matrix-Multiply-Recursive(C22,  A21 ,  B12) 

10  spawn  P-Matrix-Multiply-Recursive(7,11  ,  A12,  B21) 

1 1  spawn  P-Matrix-Multiply-Recursive(7i2,  A12,  B22) 

12  spawn  P-MATRIX-MULTIPLY-RECURSIVE^i ,  A22,  B2 1) 

13  P-Matrix-Multiply-Recursive(722,  A2 2,  B22) 

14  sync 

1 5  parallel  for  i  =  1  to  n 

1 6  parallel  for  j  =  I  to  n 

17  Cjj  -  C/y  +  tjj 

Line  3  handles  the  base  case,  where  we  are  multiplying  1  x  1  matrices.  We  handle 
the  recursive  case  in  lines  4-17.  We  allocate  a  temporary  matrix  T  in  line  4,  and 
line  5  partitions  each  of  the  matrices  A,  B,  C,  and  T  into  n/2xn/2  submatrices. 
(As  with  Square-Matrix-Multiply-Recursive  on  page  77,  we  gloss  over 
the  minor  issue  of  how  to  use  index  calculations  to  represent  submatrix  sections 
of  a  matrix.)  The  recursive  call  in  line  6  sets  the  submatrix  Cn  to  the  submatrix 
product  An  Bn,  so  that  Cn  equals  the  first  of  the  two  terms  that  form  its  sum  in 
equation  (27.6).  Similarly,  lines  7-9  set  C12,  C2i,  and  C22  to  the  first  of  the  two 
terms  that  equal  their  sums  in  equation  (27.6).  Line  10  sets  the  submatrix  Tn  to 
the  submatrix  product  Ai2B2 1,  so  that  Tn  equals  the  second  of  the  two  terms  that 
form  Cn’s  sum.  Lines  1 1-13  set  Ti2,  T2i,  and  T22  to  the  second  of  the  two  terms 
that  form  the  sums  of  Ci2,  C21,  and  C22,  respectively.  The  first  seven  recursive 
calls  are  spawned,  and  the  last  one  runs  in  the  main  strand.  The  sync  statement  in 
line  14  ensures  that  all  the  submatrix  products  in  lines  6-13  have  been  computed, 
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after  which  we  add  the  products  from  T  into  C  in  using  the  doubly  nested  parallel 
for  loops  in  lines  15-17. 

We  first  analyze  the  work  M,  (n)  of  the  P-Matrix-Multiply-Recursive 
procedure,  echoing  the  serial  running-time  analysis  of  its  progenitor  SQUARE- 
Matrix-Multiply-Recursive.  In  the  recursive  case,  we  partition  in  0(1)  time, 
perform  eight  recursive  multiplications  of  n/ 2  x  n/2  matrices,  and  finish  up  with 
the  0 (n1 2)  work  from  adding  two  n  x  n  matrices.  Thus,  the  recurrence  for  the 
work  Mi  ( n )  is 

Mi(n)  =  SMi(n/2)  +  0(«2) 

=  0(n3) 

by  case  1  of  the  master  theorem.  In  other  words,  the  work  of  our  multithreaded  al¬ 
gorithm  is  asymptotically  the  same  as  the  running  time  of  the  procedure  SQUARE- 
Matrix-Multiply  in  Section  4.2,  with  its  triply  nested  loops. 

To  determine  the  span  Afoo(n)  of  P-Matrix-Multiply-Recursive,  we  first 
observe  that  the  span  for  partitioning  is  0(1),  which  is  dominated  by  the  0 ( 1  g  n) 
span  of  the  doubly  nested  parallel  for  loops  in  lines  15-17.  Because  the  eight 
parallel  recursive  calls  all  execute  on  matrices  of  the  same  size,  the  maximum  span 
for  any  recursive  call  is  just  the  span  of  any  one.  Hence,  the  recurrence  for  the 
span  M^in)  of  P-Matrix-Multiply-Recursive  is 

M00(n)  =  M00(n/2)  +  @(\gn).  (27.7) 

This  recurrence  does  not  fall  under  any  of  the  cases  of  the  master  theorem,  but 
it  does  meet  the  condition  of  Exercise  4.6-2.  By  Exercise  4.6-2,  therefore,  the 
solution  to  recurrence  (27.7)  is  Mx(n)  —  0(lg2  n). 

Now  that  we  know  the  work  and  span  of  P-Matrix-Multiply-Recursive, 
we  can  compute  its  parallelism  as  M1(/r)/M00(n)  =  0(n3/  lg2  //),  which  is  very 
high. 

Multithreading  Strassen’s  method 

To  multithread  Strassen’s  algorithm,  we  follow  the  same  general  outline  as  on 
page  79,  only  using  nested  parallelism: 

1.  Divide  the  input  matrices  A  and  B  and  output  matrix  C  into  n/2  x  n/2  sub¬ 
matrices,  as  in  equation  (27.6).  This  step  takes  0(1)  work  and  span  by  index 
calculation. 

2.  Create  10  matrices  S\ ,  S2, ....  Sio,  each  of  which  is  n/2  x  n/2  and  is  the  sum 

or  difference  of  two  matrices  created  in  step  1.  We  can  create  all  10  matrices 
with  0(/32)  work  and  0(lg  n)  span  by  using  doubly  nested  parallel  for  loops. 
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3.  Using  the  submatrices  created  in  step  1  and  the  10  matrices  created  in 
step  2,  recursively  spawn  the  computation  of  seven  n/2  x  n/2  matrix  products 
Pi,P2,...,P7. 

4.  Compute  the  desired  submatrices  Cn,  Cj2,  C2 1,  C22  of  the  result  matrix  C  by 
adding  and  subtracting  various  combinations  of  the  P,  matrices,  once  again 
using  doubly  nested  parallel  for  loops.  We  can  compute  all  four  submatrices 
with  0(«2)  work  and  0(lgn)  span. 

To  analyze  this  algorithm,  we  first  observe  that  since  the  serialization  is  the 
same  as  the  original  serial  algorithm,  the  work  is  just  the  running  time  of  the 
serialization,  namely,  0(«lg7).  As  for  P-Matrix-Multiply-Recursive,  we 
can  devise  a  recurrence  for  the  span.  In  this  case,  seven  recursive  calls  exe¬ 
cute  in  parallel,  but  since  they  all  operate  on  matrices  of  the  same  size,  we  ob¬ 
tain  the  same  recurrence  (27.7)  as  we  did  for  P-Matrix-Multiply-Recursive, 
which  has  solution  0(lg2  /;).  Thus,  the  parallelism  of  multithreaded  Strassen’s 
method  is  0(«lg7/ lg2  «),  which  is  high,  though  slightly  less  than  the  parallelism 
of  P-Matrix-Multiply-Recursive. 

Exercises 


27.2-1 

Draw  the  computation  dag  for  computing  P-Square-Matrix-Multiply  on  2x2 
matrices,  labeling  how  the  vertices  in  your  diagram  correspond  to  strands  in  the 
execution  of  the  algorithm.  Use  the  convention  that  spawn  and  call  edges  point 
downward,  continuation  edges  point  horizontally  to  the  right,  and  return  edges 
point  upward.  Assuming  that  each  strand  takes  unit  time,  analyze  the  work,  span, 
and  parallelism  of  this  computation. 


27.2-2 

Repeat  Exercise  27.2-1  for  P-Matrix-Multiply-Recursive. 


27.2-3 

Give  pseudocode  for  a  multithreaded  algorithm  that  multiplies  two  n  x  n  matrices 
with  work  0(«3)  but  span  only  0 ( 1  g  /;).  Analyze  your  algorithm. 


27.2-4 

Give  pseudocode  for  an  efficient  multithreaded  algorithm  that  multiplies  a  p  x  q 
matrix  by  a  q  x  r  matrix.  Your  algorithm  should  be  highly  parallel  even  if  any  of 
p,  q,  and  r  are  1.  Analyze  your  algorithm. 
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27.2-5 

Give  pseudocode  for  an  efficient  multithreaded  algorithm  that  transposes  an  n  x  n 
matrix  in  place  by  using  divide-and-conquer  to  divide  the  matrix  recursively  into 
four  n/2xn/2  submatrices.  Analyze  your  algorithm. 


27.2-6 

Give  pseudocode  for  an  efficient  multithreaded  implementation  of  the  Floyd- 
Warshall  algorithm  (see  Section  25.2),  which  computes  shortest  paths  between  all 
pairs  of  vertices  in  an  edge-weighted  graph.  Analyze  your  algorithm. 


27.3  Multithreaded  merge  sort 

We  first  saw  serial  merge  sort  in  Section  2.3.1,  and  in  Section  2.3.2  we  analyzed  its 
running  time  and  showed  it  to  be  0(7?  lg«).  Because  merge  sort  already  uses  the 
divide-and-conquer  paradigm,  it  seems  like  a  terrific  candidate  for  multithreading 
using  nested  parallelism.  We  can  easily  modify  the  pseudocode  so  that  the  first 
recursive  call  is  spawned: 

Merge-Sort'  ( A ,  p ,  r) 

1  if  p  <  r 

2  q  =  Yip  +  r)l  2J 

3  spawn  Merge-Sort'(A,  p ,  q) 

4  Merge-Sort'(A,^  +  l,r) 

5  sync 

6  Merge(A,  p.q.  r) 

Like  its  serial  counterpart,  Merge-Sort'  sorts  the  subarray  A  [p  . .  r\.  After  the 
two  recursive  subroutines  in  lines  3  and  4  have  completed,  which  is  ensured  by  the 
sync  statement  in  line  5,  Merge-Sort'  calls  the  same  Merge  procedure  as  on 
page  31. 

Let  us  analyze  Merge-Sort'.  To  do  so,  we  first  need  to  analyze  Merge.  Re¬ 
call  that  its  serial  running  time  to  merge  n  elements  is  0(/r).  Because  Merge  is 
serial,  both  its  work  and  its  span  are  ©(77).  Thus,  the  following  recurrence  charac¬ 
terizes  the  work  MS\  (n)  of  Merge-Sort'  on  n  elements: 

MS[(n)  =  2MS\(n/2)  +  0(«) 

=  0(77  lg  77), 
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Figure  27.6  The  idea  behind  the  multithreaded  merging  of  two  sorted  subarrays  T[p\..r\\ 
and  T[p2  ■  ■  r2]  into  the  subarray  A\p2  . .  7-3].  Letting  x  =  T[q\\  be  the  median  of  T[p\ . .  r\]  and  q2 
be  the  place  in  T[p2  ■  ■  r2 ]  such  that  x  would  fall  between  T[q2  —  1]  and  T{q2\,  every  element  in 
subarrays  T[p\  .  ,q\  —  1]  and  T[p2  ■  .q2  —  1]  (lightly  shaded)  is  less  than  or  equal  to  x,  and  every 
element  in  the  subarrays  T[q\  +  1 . .  rj]  and  T[q2  +  1  •  •  r2]  (heavily  shaded)  is  at  least  x.  To  merge, 
we  compute  the  index  q 3  where  x  belongs  in  A[p2 . .  7-3],  copy  x  into  A[q2],  and  then  recursively 
merge  T[p\  .  ,q\  —  l]  with  T[p2  . . ^2  —  1]  into  A[p2  .  .<73  —  1]  and  T[q\  +  1 .  .ri]  with  T[q2  ■ . r 2 ] 
into  A[qs  +  1 .  .7-3]. 


which  is  the  same  as  the  serial  running  time  of  merge  sort.  Since  the  two  recursive 
calls  of  Merge- Sort'  can  run  in  parallel,  the  span  MS ^  is  given  by  the  recurrence 

AO)  =  MS'00(n/2)  +  &(n) 

=  ©(«)• 

Thus,  the  parallelism  of  Merge-Sort'  comes  to  MS\(n) / MS'^in)  —  0(lg7t), 
which  is  an  unimpressive  amount  of  parallelism.  To  sort  10  million  elements,  for 
example,  it  might  achieve  linear  speedup  on  a  few  processors,  but  it  would  not 
scale  up  effectively  to  hundreds  of  processors. 

You  probably  have  already  figured  out  where  the  parallelism  bottleneck  is  in 
this  multithreaded  merge  sort:  the  serial  Merge  procedure.  Although  merging 
might  initially  seem  to  be  inherently  serial,  we  can,  in  fact,  fashion  a  multithreaded 
version  of  it  by  using  nested  parallelism. 

Our  divide-and-conquer  strategy  for  multithreaded  merging,  which  is  illus¬ 
trated  in  Figure  27.6,  operates  on  subarrays  of  an  array  T.  Suppose  that  we 
are  merging  the  two  sorted  subarrays  T[p\  ..ri]  of  length  77 1  =  r\  —  p\  +  1 
and  T[p2 . .  r2]  of  length  n2  —  r2  —  /?2  -F  1  into  another  subarray  A[p2  . .  r3],  of 
length  h3  =  r3  —  /?3  -(-  1  =  /;  1  -)-  n2.  Without  loss  of  generality,  we  make  the  sim¬ 
plifying  assumption  that  tix  >  n2. 

We  first  find  the  middle  element  x  =  T[cp\  of  the  subarray  T[p\..r\], 
where  qx  =  [(/h  +  '*i)/2J.  Because  the  subarray  is  sorted,  x  is  a  median 
of  T[px .  .rj:  every  element  in  T[p{ .  ,q±  —  1]  is  no  more  than  x,  and  every  el¬ 
ement  in  T[qx  1 . .  rt]  is  no  less  than  x.  We  then  use  binary  search  to  find  the 
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index  q2  in  the  subarray  T[p2  . .  r2]  so  that  the  subarray  would  still  be  sorted  if  we 
inserted  x  between  T[q2  —  1]  and  T[q2\. 

We  next  merge  the  original  subarrays  T[px  . .  rx\  and  T[p2  . .  r2 ]  into  A[p3  . .  r3] 
as  follows: 

1.  Set  q3  =  p3  +  (qx  -  px)  +  (<?2  -  Pi)- 

2.  Copy  x  into 

3.  Recursively  merge  T [px  . .  qx  —  1]  with  T[p2  . .  q2  —  1],  and  place  the  result  into 
the  subarray  A[p3  . .  q3  —  1]. 

4.  Recursively  merge  T[qi  +  1 . . rj  with  T[q2  . .  r2],  and  place  the  result  into  the 
subarray  A[q3  +  1  . .  r3\. 

When  we  compute  q3,  the  quantity  qx—px  is  the  number  of  elements  in  the  subarray 
T[p\  ■  ■  q\  —  1],  and  the  quantity  q2  —  p2  is  the  number  of  elements  in  the  subarray 
T[p2  .  .q2  —  1].  Thus,  their  sum  is  the  number  of  elements  that  end  up  before  x  in 
the  subarray  A[p3 . .  r3\. 

The  base  case  occurs  when  nx  =  n2  =  0,  in  which  case  we  have  no  work 
to  do  to  merge  the  two  empty  subarrays.  Since  we  have  assumed  that  the  sub¬ 
array  T[pi  . .  rx\  is  at  least  as  long  as  T[p2  . .  r2],  that  is,  nx  >  n2,  we  can  check 
for  the  base  case  by  just  checking  whether  n  1  =  0.  We  must  also  ensure  that  the 
recursion  properly  handles  the  case  when  only  one  of  the  two  subarrays  is  empty, 
which,  by  our  assumption  that  nx  >  n2,  must  be  the  subarray  T[p2  . .  r2]. 

Now,  let’s  put  these  ideas  into  pseudocode.  We  start  with  the  binary  search, 
which  we  express  serially.  The  procedure  Binary-Search(x,  T,  p,  r)  takes  a 
key  x  and  a  subarray  T[p  . .  r],  and  it  returns  one  of  the  following: 

•  If  T[p  . .  r ]  is  empty  ( r  <  p),  then  it  returns  the  index  p. 

•  If  x  <  T\p],  and  hence  less  than  or  equal  to  all  the  elements  of  T\p  .  .r],  then 
it  returns  the  index  p. 

•  If  x  >  T[p],  then  it  returns  the  largest  index  q  in  the  range  p  <q<  r  +  1  such 
that  T[q  —  1]  <  x. 

Here  is  the  pseudocode: 

Binary-Search(x,  T,  p,r) 

1  low  =  p 

2  high  =  ma  x(p,r  +  1) 

3  while  low  <  high 

4  mid  =  \_{low  +  high)  /  2\ 

5  if  x  <  T  [mid\ 

6  high  =  mid 

7  else  low  =  mid  +  1 

8  return  high 
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The  call  Binary-Search(x,  T,  p.  r )  takes  Q(lg n)  serial  time  in  the  worst  case, 
where  n  =  r  —  p  +  1  is  the  size  of  the  subarray  on  which  it  runs.  (See  Exer¬ 
cise  2.3-5.)  Since  Binary-Search  is  a  serial  procedure,  its  worst-case  work  and 
span  are  both  0(lg  n). 

We  are  now  prepared  to  write  pseudocode  for  the  multithreaded  merging  pro¬ 
cedure  itself.  Like  the  Merge  procedure  on  page  31,  the  P-Merge  procedure 
assumes  that  the  two  subarrays  to  be  merged  lie  within  the  same  array.  Un¬ 
like  Merge,  however,  P-Merge  does  not  assume  that  the  two  subarrays  to 
be  merged  are  adjacent  within  the  array.  (That  is,  P-Merge  does  not  require 
that  p2  =  rx  +  1.)  Another  difference  between  Merge  and  P-Merge  is  that 
P-Merge  takes  as  an  argument  an  output  subarray  A  into  which  the  merged  val¬ 
ues  should  be  stored.  The  call  P-Merge(T,  Pi,rx,  p2,r2,  A,  Pi)  merges  the  sorted 
subarrays  T[pi..r^\  and  T[p2 . .  r2\  into  the  subarray  A[p3..r3],  where  r3  = 
P3  +  0i  -  pi  +  1)  +  (r2  -  p2  +  1)  -  1  =  p3  +  (r,  -  +  (r2  -  p2)  +  1  and 

is  not  provided  as  an  input. 


P-Merge(7,  pi,rx,p2,  r2,  A ,  p3) 

1  =  rx  —  px  +  1 

2  n2  -  r2-  p2  +  \ 

3  if  iii  <  n2  //  ensure  that  nt  >n2 

4  exchange  p\  with  p2 

5  exchange  rx  with  r2 

6  exchange  nx  with  n2 

7  if  iii  ==  0  //  both  empty? 

8  return 

9  else  qi  =  |_Oi  +  u)/2J 

10  q2  =  BlNARY-SEARCH(7’[ry1],  7,  p2,r2) 

11  q3  =  p3  +  (qi  -  pi)  +  (q2  -  p2) 

12  A[q3\  =  T[qi] 

13  spawn  P-Merge(T,  p\,q\  —  l.pi.qi  —  1  ,A,  p3) 

14  P-Merge(T,  -)-  1 ,  f  i ,  q2*f2 ,  ri,  q3  4-  1) 

15  sync 

The  P-Merge  procedure  works  as  follows.  Lines  1-2  compute  the  lengths  n , 
and  n2  of  the  subarrays  T[px  .  .rx]  and  T[p2  ..r2\,  respectively.  Lines  3-6  en¬ 
force  the  assumption  that  nx  >  n2.  Line  7  tests  for  the  base  case,  where  the 
subarray  T[px  . .  rx]  is  empty  (and  hence  so  is  T[p2  . .  r2]),  in  which  case  we  sim¬ 
ply  return.  Lines  9-15  implement  the  divide-and-conquer  strategy.  Line  9  com¬ 
putes  the  midpoint  of  T [px . .  rx\,  and  line  10  finds  the  point  q2  in  T[p2  . .  r2\  such 
that  all  elements  m  T[p2  .  ,q2  —  1]  are  less  than  T[qt]  (which  corresponds  to  x) 
and  all  the  elements  in  T[q2  . .  p2\  are  at  least  as  large  as  T[qx\.  Line  11  com- 
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putes  the  index  q2  of  the  element  that  divides  the  output  subarray  A  [p3 . .  r3]  into 
A[p2  . .  <73  —  1]  and  A[q2  + 1  . .  r3],  and  then  line  12  copies  T[qi\  directly  into  A[q2\. 

Then,  we  recurse  using  nested  parallelism.  Line  13  spawns  the  first  subproblem, 
while  line  14  calls  the  second  subproblem  in  parallel.  The  sync  statement  in  line  15 
ensures  that  the  subproblems  have  completed  before  the  procedure  returns.  (Since 
every  procedure  implicitly  executes  a  sync  before  returning,  we  could  have  omitted 
the  sync  statement  in  line  15,  but  including  it  is  good  coding  practice.)  There 
is  some  cleverness  in  the  coding  to  ensure  that  when  the  subarray  T[p2  . .  r2]  is 
empty,  the  code  operates  correctly.  The  way  it  works  is  that  on  each  recursive  call, 
a  median  element  of  T[p\  . .  rj  is  placed  into  the  output  subarray,  until  T[p\  . .  r\\ 
itself  finally  becomes  empty,  triggering  the  base  case. 

Analysis  of  multithreaded  merging 

We  first  derive  a  recurrence  for  the  span  /’M00(n)  of  P-Merge,  where  the  two 
subarrays  contain  a  total  of  n  =  n  ,  +n2  elements.  Because  the  spawn  in  line  13and 
the  call  in  line  14  operate  logically  in  parallel,  we  need  examine  only  the  costlier  of 
the  two  calls.  The  key  is  to  understand  that  in  the  worst  case,  the  maximum  number 
of  elements  in  either  of  the  recursive  calls  can  be  at  most  3 n  / 4,  which  we  see  as 
follows.  Because  lines  3-6  ensure  that  n2  <  «i,  it  follows  that  n2  =  2n2/2  < 
( 7?i  +  n2)/2  =  n/2.  In  the  worst  case,  one  of  the  two  recursive  calls  merges 
[«i/2J  elements  of  T[px  . .  r3]  with  all  n2  elements  of  T[p2  . .  r2 ],  and  hence  the 
number  of  elements  involved  in  the  call  is 

\n  i  /  2J  +  n2  <  «i/2  +  n2/2  +  n2!2 

—  (/i  i  +  /7  2)  /  2  +  772/2 

<  n/2  +  77/4 

=  3t?/4  . 

Adding  in  the  0(lg77)  cost  of  the  call  to  Binary-Search  in  line  10,  we  obtain 
the  following  recurrence  for  the  worst-case  span: 

PM^in)  =  PMX (377/4)  +  0(lg  77)  .  (27.8) 

(For  the  base  case,  the  span  is  0(1),  since  lines  1-8  execute  in  constant  time.) 
This  recurrence  does  not  fall  under  any  of  the  cases  of  the  master  theorem,  but  it 
meets  the  condition  of  Exercise  4.6-2.  Therefore,  the  solution  to  recurrence  (27.8) 
is  PM 00(11)  -  0(lg2  77). 

We  now  analyze  the  work  PMi(n)  of  P-Merge  on  n  elements,  which  turns  out 
to  be  0(77).  Since  each  of  the  n  elements  must  be  copied  from  array  T  to  array  A, 
we  have  PMi(n)  =  Q,(n).  Thus,  it  remains  only  to  show  that  PMi(n)  =  O(n). 

We  shall  first  derive  a  recurrence  for  the  worst-case  work.  The  binary  search  in 
line  10  costs  0 ( 1  g  n  )  in  the  worst  case,  which  dominates  the  other  work  outside 
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of  the  recursive  calls.  For  the  recursive  calls,  observe  that  although  the  recursive 
calls  in  lines  13  and  14  might  merge  different  numbers  of  elements,  together  the 
two  recursive  calls  merge  at  most  n  elements  (actually  n  —  1  elements,  since  T  [q ,  ] 
does  not  participate  in  either  recursive  call).  Moreover,  as  we  saw  in  analyzing  the 
span,  a  recursive  call  operates  on  at  most  3n  / 4  elements.  We  therefore  obtain  the 
recurrence 


PMx(n)  =  PM i (an)  +  PMX{{  1  -  a)n)  +  O(lgn)  ,  (27.9) 

where  a  lies  in  the  range  \  /  A  <  a  <  3/4,  and  where  we  understand  that  the  actual 
value  of  a  may  vary  for  each  level  of  recursion. 

We  prove  that  recurrence  (27.9)  has  solution  PMi  =  0(n)  via  the  substitution 
method.  Assume  that  PMi(n )  <  C\n— c2  lg  n  for  some  positive  constants  C\  and  c2. 
Substituting  gives  us 


PM  {  (n  )  < 


< 


(ciun  -  c2\g{un))  +  (ci(l  -  a)n  -  c2lg((l  -  «)«))  +  0(lgn) 
ci (a  +  (1  -  oc))n  -  c2(lg(aw)  +  lg((l  -  a)nj)  +  0(lgw) 

C\ii  -  c2(lga  +  lg n  +  lg(l  -  a)  +  lgn)  +  0(lg«) 
c\n  —  c2  lg  n  -  (c2(lg  n  +  lg(a(l  -  a)))-  ©(lgn)) 

Cin  —  c2lgn  , 


since  we  can  choose  c2  large  enough  that  c2(lgn  +  lg(a(l  —  a)))  dominates  the 
0(lgn)  term.  Furthermore,  we  can  choose  Ci  large  enough  to  satisfy  the  base 
conditions  of  the  recurrence.  Since  the  work  PMx(n)  of  P-Merge  is  both  Q(n) 
and  O(n),  we  have  PMt  (n)  =  0(n). 

The  parallelism  of  P-Merge  is  PMx{n) / PM^n)  =  0(n/  lg2  n). 


Multithreaded  merge  sort 

Now  that  we  have  a  nicely  parallelized  multithreaded  merging  procedure,  we  can 
incorporate  it  into  a  multithreaded  merge  sort.  This  version  of  merge  sort  is  similar 
to  the  Merge-Sort'  procedure  we  saw  earlier,  but  unlike  Merge-Sort',  it  takes 
as  an  argument  an  output  subarray  B,  which  will  hold  the  sorted  result.  In  par¬ 
ticular,  the  call  P-Merge-Sort(A,  p,  r,  B,  s )  sorts  the  elements  in  A[p  . .  r ]  and 
stores  them  in  B[s  . .  s  +  r  —  p). 
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P-Merge-Sort  (A,  p ,  r,  B,  s) 

1  n  =  r  —  p  +  1 

2  if  72  ==  1 

3  5[s]  =  A[/>] 

4  else  let  T[  1 . .  n]  be  a  new  array 

5  q  =  L(p  + r  )/2J 

6  q'  =  q  -  p  +  1 

7  spawn  P-Merge-Sort(A,  q,  T,  1) 

8  P-Merge-Sort  (A,  g  +  l,r,  T,q’  +  1) 

9  sync 

10  P-Merge  (T,  l,q'  ,q'  +  l,n,  B ,s) 

After  line  1  computes  the  number  n  of  elements  in  the  input  subarray  A[p  . .  r], 
lines  2-3  handle  the  base  case  when  the  array  has  only  1  element.  Lines  4-6  set 
up  for  the  recursive  spawn  in  line  7  and  call  in  line  8,  which  operate  in  parallel.  In 
particular,  line  4  allocates  a  temporary  array  T  with  n  elements  to  store  the  results 
of  the  recursive  merge  sorting.  Line  5  calculates  the  index  q  of  A[p  . .  r]  to  divide 
the  elements  into  the  two  subarrays  A[p  . .  q]  and  A[q  +  1 . .  r]  that  will  be  sorted 
recursively,  and  line  6  goes  on  to  compute  the  number  q'  of  elements  in  the  first 
subarray  A[p  . .  q],  which  line  8  uses  to  determine  the  starting  index  in  T  of  where 
to  store  the  sorted  result  of  A[q  +  1 . .  r].  At  that  point,  the  spawn  and  recursive 
call  are  made,  followed  by  the  sync  in  line  9,  which  forces  the  procedure  to  wait 
until  the  spawned  procedure  is  done.  Finally,  line  10  calls  P-Merge  to  merge 
the  sorted  subarrays,  now  in  T[\  . .  q']  and  T[q'  +  1  . .  n],  into  the  output  subarray 
B[s  . .  s  +  r  —  p]. 


Analysis  of  multithreaded  merge  sort 

We  start  by  analyzing  the  work  PMS\(n)  of  P-Merge-Sort,  which  is  consider¬ 
ably  easier  than  analyzing  the  work  of  P-Merge.  Indeed,  the  work  is  given  by  the 
recurrence 

PMSi  (n)  =  2  PMSx  {n  /2)  +  PMX  ( n ) 

=  2PMSi(n/2)  +  ©(/?)  . 

This  recurrence  is  the  same  as  the  recurrence  (4.4)  for  ordinary  Merge-Sort 
from  Section  2.3.1  and  has  solution  PMS\(n)  =  0(/?  lgn)  by  case  2  of  the  master 
theorem. 

We  now  derive  and  analyze  a  recurrence  for  the  worst-case  span  PMS^in).  Be¬ 
cause  the  two  recursive  calls  to  P-Merge-Sort  on  lines  7  and  8  operate  logically 
in  parallel,  we  can  ignore  one  of  them,  obtaining  the  recurrence 
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P MS  00(11)  =  PMS00(n/2)  +  PM00(n) 

=  PMSoo(n/2)  +  0(lg2»)  .  (27.10) 

As  for  recurrence  (27.8),  the  master  theorem  does  not  apply  to  recurrence  (27.10), 
but  Exercise  4.6-2  does.  The  solution  is  PMS00(/i)  =  0(lg3  n),  and  so  the  span  of 
P-Merge-Sort  is  0(lg3  n). 

Parallel  merging  gives  P-Merge-Sort  a  significant  parallelism  advantage  over 
Merge-Sort'.  Recall  that  the  parallelism  of  Merge-Sort',  which  calls  the  se¬ 
rial  Merge  procedure,  is  only  0(lg/j).  For  P-Merge-Sort,  the  parallelism  is 

PMSiW/PMSooin)  =  0(/j  lg«)/0(lg3/j) 

=  ©(/j/lgV), 

which  is  much  better  both  in  theory  and  in  practice.  A  good  implementation  in 
practice  would  sacrifice  some  parallelism  by  coarsening  the  base  case  in  order  to 
reduce  the  constants  hidden  by  the  asymptotic  notation.  The  straightforward  way 
to  coarsen  the  base  case  is  to  switch  to  an  ordinary  serial  sort,  perhaps  quicksort, 
when  the  size  of  the  array  is  sufficiently  small. 

Exercises 


27.3-1 

Explain  how  to  coarsen  the  base  case  of  P-Merge. 


27.3-2 

Instead  of  finding  a  median  element  in  the  larger  subarray,  as  P-Merge  does,  con¬ 
sider  a  variant  that  finds  a  median  element  of  all  the  elements  in  the  two  sorted 
subarrays  using  the  result  of  Exercise  9.3-8.  Give  pseudocode  for  an  efficient 
multithreaded  merging  procedure  that  uses  this  median-finding  procedure.  Ana¬ 
lyze  your  algorithm. 


27.3- 3 

Give  an  efficient  multithreaded  algorithm  for  partitioning  an  array  around  a  pivot, 
as  is  done  by  the  PARTITION  procedure  on  page  171.  You  need  not  partition  the  ar¬ 
ray  in  place.  Make  your  algorithm  as  parallel  as  possible.  Analyze  your  algorithm. 
(Hint:  You  may  need  an  auxiliary  array  and  may  need  to  make  more  than  one  pass 
over  the  input  elements.) 

27.3- 4 

Give  a  multithreaded  version  of  RECURSIVE-FFT  on  page  911.  Make  your  imple¬ 
mentation  as  parallel  as  possible.  Analyze  your  algorithm. 
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27.3- 5  * 

Give  a  multithreaded  version  of  Randomized-Select  on  page  216.  Make  your 
implementation  as  parallel  as  possible.  Analyze  your  algorithm.  {Hint:  Use  the 
partitioning  algorithm  from  Exercise  27.3-3.) 

27.3- 6  * 

Show  how  to  multithread  Select  from  Section  9.3.  Make  your  implementation  as 
parallel  as  possible.  Analyze  your  algorithm. 


Problems 


27-1  Implementing  parallel  loops  using  nested  parallelism 

Consider  the  following  multithreaded  algorithm  for  performing  pairwise  addition 

on  77 -element  arrays  A[  1 . .  77]  and  B[  1  . .  77],  storing  the  sums  in  C[1  . .  77]: 


Sum-Arrays(A,  B,  C) 

1  parallel  for  i  =  1  to  A. length 

2  C[i]  =  A[i]  +  B[i] 

a.  Rewrite  the  parallel  loop  in  Sum-Arrays  using  nested  parallelism  (spawn 
and  sync)  in  the  manner  of  Mat-Vec-Main-Loop.  Analyze  the  parallelism 
of  your  implementation. 

Consider  the  following  alternative  implementation  of  the  parallel  loop,  which 
contains  a  value  grain-size  to  be  specified: 


Sum-Arrays' (A,  B.  C) 


1 

2 

3 

4 

5 

6 


77  =  A. length 

grain-size  =  ?  //to  be  determined 

r  =  \n  /grain- size] 

for  k  =  0  to  r  —  1 

spawn  Add-Subarray (A,  B.C.k  ■  grain-size  +  1, 

min((k  +  1)  ■  grain-size,  n )) 

sync 


Add-Subarray(A,  B,  C,  i,  j) 

1  for  k  —  i  to  j 

2  C[k\  =  A[k\  +  B[k\ 
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b.  Suppose  that  we  set  grain-size  =  1.  What  is  the  parallelism  of  this  implemen¬ 
tation? 

c.  Give  a  formula  for  the  span  of  Sum-Arrays'  in  terms  of  n  and  grain-size. 
Derive  the  best  value  for  grain-size  to  maximize  parallelism. 

27-2  Saving  temporary  space  in  matrix  multiplication 

The  P-Matrix-Multiply-Recursive  procedure  has  the  disadvantage  that  it 
must  allocate  a  temporary  matrix  T  of  size  n  x  n,  which  can  adversely  affect  the 
constants  hidden  by  the  ©-notation.  The  P-Matrix-Multiply-Recursive  pro¬ 
cedure  does  have  high  parallelism,  however.  For  example,  ignoring  the  constants 
in  the  ©-notation,  the  parallelism  for  multiplying  1000  x  1000  matrices  comes  to 
approximately  10003/102  =  107,  since  lg  1000  ss  10.  Most  parallel  computers 
have  far  fewer  than  10  million  processors. 

a.  Describe  a  recursive  multithreaded  algorithm  that  eliminates  the  need  for  the 
temporary  matrix  T  at  the  cost  of  increasing  the  span  to  ©(«)■  {Hint:  Com¬ 
pute  C  =  C  +  AB  following  the  general  strategy  of  P-Matrix-Multiply- 
Recursive,  but  initialize  C  in  parallel  and  insert  a  sync  in  a  judiciously  cho¬ 
sen  location.) 

b.  Give  and  solve  recurrences  for  the  work  and  span  of  your  implementation. 

c.  Analyze  the  parallelism  of  your  implementation.  Ignoring  the  constants  in  the 
©-notation,  estimate  the  parallelism  on  1000  x  1000  matrices.  Compare  with 
the  parallelism  of  P-Matrix-Multiply-Recursive. 

27-3  Multithreaded  matrix  algorithms 

a.  Parallelize  the  LU-DECOMPOSITION  procedure  on  page  821  by  giving  pseu¬ 
docode  for  a  multithreaded  version  of  this  algorithm.  Make  your  implementa¬ 
tion  as  parallel  as  possible,  and  analyze  its  work,  span,  and  parallelism. 

b.  Do  the  same  for  LUP-Decomposition  on  page  824. 

c.  Do  the  same  for  LUP-SOLVE  on  page  817. 

d.  Do  the  same  for  a  multithreaded  algorithm  based  on  equation  (28.13)  for  in¬ 
verting  a  symmetric  positive-definite  matrix. 
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27-4  Multithreading  reductions  and  prefix  computations 
A  <S>-reduction  of  an  array  x[\  ../?],  where  <g>  is  an  associative  operator,  is  the  value 

y  =  x[l]  <g>  x[2]  <g>  ■  ■  ■  <g>  x[n ]  . 

The  following  procedure  computes  the  <g>-reduction  of  a  subaiTay  x[i  . .  j ]  serially. 

Reduce(x,  i,j) 

1  y  =  x[i] 

2  for  k  =  i  +  1  to  j 

3  y  =  y  ( g)  x[k\ 

4  return  y 

a.  Use  nested  parallelism  to  implement  a  multithreaded  algorithm  P-Reduce, 
which  performs  the  same  function  with  0(n)  work  and  O  ( 1  g  n)  span.  Analyze 
your  algorithm. 

A  related  problem  is  that  of  computing  a  ®- prefix  computation,  sometimes 
called  a  <gi-scan,  on  an  array  x[\  . .  «],  where  <g>  is  once  again  an  associative  op¬ 
erator.  The  <8>-scan  produces  the  array  y  [1  . .  n]  given  by 

J[l]  =  x[l], 

y[ 2]  =  x[l]  <8>  x[2]  , 

y[3]  =  x[l]  <8>  x[2]  <g>  x[3]  , 

y[n]  =  x[l]  <8>  x[2]  <8>  x[3]  <g>  ■  ■  ■  <8>  x[n\  , 

that  is,  all  prefixes  of  the  array  x  “summed”  using  the  <8>  operator.  The  following 
serial  procedure  SCAN  performs  a  <g>-prefix  computation: 

Scan(x) 

1  n  =  x.  length 

2  let  y  [1 . .  n]  be  a  new  array 

3  y[  1]  =  x[l] 

4  for  i  —  2  to  n 

5  y[i]  —  y[i  -  1]  ®  x[i] 

6  return  y 

Unfortunately,  multithreading  SCAN  is  not  straightforward.  For  example,  changing 
the  for  loop  to  a  parallel  for  loop  would  create  races,  since  each  iteration  of  the 
loop  body  depends  on  the  previous  iteration.  The  following  procedure  P-SCAN-1 
performs  the  <g>-prefix  computation  in  parallel,  albeit  inefficiently: 
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P-SCAN- 1  (x) 

1  n  =  x.  length 

2  let  y[l  . .  n\  be  a  new  array 

3  P-SCAN-1-Aux(x,y,  \,n) 

4  return  y 

P-Scan-1-Aux (x,y,  i,  j) 

1  parallel  for  l  =  i  to  j 

2  y[l]  =  P-REDUCE(x,  1, /) 

b.  Analyze  the  work,  span,  and  parallelism  of  P-SCAN-1. 

By  using  nested  parallelism,  we  can  obtain  a  more  efficient  <g>-prefix  computa¬ 
tion: 

P-SCAN-2(x) 

1  n  =  x.  length 

2  let  y[  1  . .  n\  be  a  new  aiTay 

3  P-SCAN-2-Aux(x,y,  1,«) 

4  return  y 

P-Scan-2-Aux(x,  y,  i,  j ) 

1  if  j  ==  j 

2  y[i]  =  x[i] 

3  else  k  =  [O’  +  j)/ 2J 

4  spawn  P-Scan-2-Aux(x,  y,  i,  k) 

5  P-SCAN-2-Aux(x,y,k  +  \,j) 

6  sync 

7  parallel  for  /  =  k  +  1  to  j 

8  y[l]  =  y[k]  ®  y[l] 

c.  Argue  that  P-Scan-2  is  correct,  and  analyze  its  work,  span,  and  parallelism. 

We  can  improve  on  both  P-SCAN-1  and  P-Scan-2  by  performing  the  <8>-prefix 
computation  in  two  distinct  passes  over  the  data.  On  the  first  pass,  we  gather  the 
terms  for  various  contiguous  subarrays  of  x  into  a  temporary  array  t,  and  on  the 
second  pass  we  use  the  terms  in  t  to  compute  the  final  result  y.  The  following 
pseudocode  implements  this  strategy,  but  certain  expressions  have  been  omitted: 
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P-Scan-3(x) 

1  n  =  x.  length 

2  let  y  [1 . .  n]  and  t  [1 . .  n]  be  new  arrays 

3  y  [1]  =  x[l] 

4  ifn  >  1 

5  P-SCAN-UP(x,t,2,») 

6  P-Scan-Down(x[1],  x,  f,  y,  2,n) 

7  return  y 

P-Scan-Up(x,  t,  i,  j) 

1  if  i  ==  j 

2  return  x  [/  ] 

3  else 

4  k  =  L  (i  +  j)/ 2J 

5  ?[k]  =  spawn  P-SCAN-UP(x,/\/,k) 

6  r/g/it  =  P-Scan-Up(x,  t,  k  +  1,  y) 

7  sync 

8  return  _  //  fill  in  the  blank 

P-Scan-Down(v,  x,  t,  y,  i,  j ) 

1  if  i  ==  y 

2  y  [i  ]  =  v  <8)  x  [/  ] 

3  else 

4  k  =  L  (i  +  j)/ 2J 

5  spawn  P-Scan-Down( _ ,x,t,y,i,k )  // fill  in  the  blank 

6  P-Scan-Down( _ x,  t,  y,  k  +  1,  j)  // fill  in  the  blank 

7  sync 

d.  Fill  in  the  three  missing  expressions  in  line  8  of  P- Scan-Up  and  lines  5  and  6 
of  P-Scan-Down.  Argue  that  with  expressions  you  supplied,  P-Scan-3  is 
correct.  {Hint:  Prove  that  the  value  v  passed  to  P-Scan-Down(v,  x,  t,  y,  i,  j ) 
satisfies  v  =  x[l]  <g>  x[2]  <g>  •  •  •  <g>  x[i  —  1].) 

e.  Analyze  the  work,  span,  and  parallelism  of  P-Scan-3. 

27-5  Multithreading  a  simple  stencil  calculation 

Computational  science  is  replete  with  algorithms  that  require  the  entries  of  an  array 
to  be  filled  in  with  values  that  depend  on  the  values  of  certain  already  computed 
neighboring  entries,  along  with  other  information  that  does  not  change  over  the 
course  of  the  computation.  The  pattern  of  neighboring  entries  does  not  change 
during  the  computation  and  is  called  a  stencil.  For  example,  Section  15.4  presents 
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a  stencil  algorithm  to  compute  a  longest  common  subsequence,  where  the  value  in 
entry  c[z,  j]  depends  only  on  the  values  in  c[i  —  1,  j ],  c[i,  j  —  1],  and  c[i  —  1,  j  —  1], 
as  well  as  the  elements  x-,  and  y7  within  the  two  sequences  given  as  inputs.  The 
input  sequences  are  fixed,  but  the  algorithm  fills  in  the  two-dimensional  array  c  so 
that  it  computes  entry  c[i,  j]  after  computing  all  three  entries  c[i  —  1,  j ],  c[i,  j  —  1], 
and  c[i  —  1,  j  —  1]. 

In  this  problem,  we  examine  how  to  use  nested  parallelism  to  multithread  a 
simple  stencil  calculation  on  an  n  x  n  array  A  in  which,  of  the  values  in  A,  the 
value  placed  into  entry  A  [i ,  j ]  depends  only  on  values  in  A  [/'.  j'\,  where  i'  <  i 
and  j'  <  j  (and  of  course,  i  ^  i  or  j'  ^  j).  In  other  words,  the  value  in  an 
entry  depends  only  on  values  in  entries  that  are  above  it  and/or  to  its  left,  along 
with  static  information  outside  of  the  array.  Furthermore,  we  assume  throughout 
this  problem  that  once  we  have  filled  in  the  entries  upon  which  A[i,  j]  depends,  we 
can  fill  in  A[i,  j]  in  0(1)  time  (as  in  the  LCS-Length  procedure  of  Section  15.4). 

We  can  partition  the  n  x  n  array  A  into  four  n/2  x  n/2  subarrays  as  follows: 


(27.11) 


Observe  now  that  we  can  fill  in  subarray  A , ,  recursively,  since  it  does  not  depend 
on  the  entries  of  the  other  three  subarrays.  Once  Au  is  complete,  we  can  continue 
to  fill  in  A 12  and  A2i  recursively  in  parallel,  because  although  they  both  depend 
on  An,  they  do  not  depend  on  each  other.  Finally,  we  can  fill  in  ,4 22  recursively. 

a.  Give  multithreaded  pseudocode  that  performs  this  simple  stencil  calculation 
using  a  divide-and-conquer  algorithm  Simple-Stencil  based  on  the  decom¬ 
position  (27. 11)  and  the  discussion  above.  (Don’t  worry  about  the  details  of  the 
base  case,  which  depends  on  the  specific  stencil.)  Give  and  solve  recurrences 
for  the  work  and  span  of  this  algorithm  in  terms  of  n .  What  is  the  parallelism? 

b.  Modify  your  solution  to  part  (a)  to  divide  an  n  x  n  array  into  nine  n/2  x  n/2 
subarrays,  again  recursing  with  as  much  parallelism  as  possible.  Analyze  this 
algorithm.  How  much  more  or  less  parallelism  does  this  algorithm  have  com¬ 
pared  with  the  algorithm  from  part  (a)? 

c.  Generalize  your  solutions  to  pails  (a)  and  (b)  as  follows.  Choose  an  integer 
b  >2.  Divide  an  n  x  n  array  into  b2  subarrays,  each  of  size  n/bxn/b,  recursing 
with  as  much  parallelism  as  possible.  In  terms  of  n  and  b,  what  are  the  work, 
span,  and  parallelism  of  your  algorithm?  Argue  that,  using  this  approach,  the 
parallelism  must  be  o(n)  for  any  choice  of  b  >  2.  {Hint:  For  this  last  argument, 
show  that  the  exponent  of  n  in  the  parallelism  is  strictly  less  than  1  for  any 
choice  of  b  >  2.) 
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d.  Give  pseudocode  for  a  multithreaded  algorithm  for  this  simple  stencil  calcu¬ 
lation  that  achieves  0(«/  Ig  n)  parallelism.  Argue  using  notions  of  work  and 
span  that  the  problem,  in  fact,  has  &(n)  inherent  parallelism.  As  it  turns  out, 
the  divide-and-conquer  nature  of  our  multithreaded  pseudocode  does  not  let  us 
achieve  this  maximal  parallelism. 

27-6  Randomized  multithreaded  algorithms 

Just  as  with  ordinary  serial  algorithms,  we  sometimes  want  to  implement  random¬ 
ized  multithreaded  algorithms.  This  problem  explores  how  to  adapt  the  various 
performance  measures  in  order  to  handle  the  expected  behavior  of  such  algorithms. 
It  also  asks  you  to  design  and  analyze  a  multithreaded  algorithm  for  randomized 
quicksort. 

a.  Explain  how  to  modify  the  work  law  (27.2),  span  law  (27.3),  and  greedy  sched¬ 
uler  bound  (27.4)  to  work  with  expectations  when  Tp,  7\,  and  7V  are  all  ran¬ 
dom  variables. 

b.  Consider  a  randomized  multithreaded  algorithm  for  which  1%  of  the  time  we 
have  Tx  =  104  and  r10,ooo  =  1,  but  for  99%  of  the  time  we  have  T\  = 
Tx  o,ooo  =  109.  Argue  that  the  speedup  of  a  randomized  multithreaded  algo¬ 
rithm  should  be  defined  as  E  [Tx]  /E  [7V],  rather  than  E  [7)  /  T/>\. 

c.  Argue  that  the  parallelism  of  a  randomized  multithreaded  algorithm  should  be 
defined  as  the  ratio  E  [Tx]  /E  [7V]. 

d.  Multithread  the  RANDOMlZED-QuiCKSORT  algorithm  on  page  179  by  using 
nested  parallelism.  (Do  not  parallelize  Randomized-Partition.)  Give  the 
pseudocode  for  your  P-Randomized-Quicksort  algorithm. 

e.  Analyze  your  multithreaded  algorithm  for  randomized  quicksort.  (Hint:  Re¬ 
view  the  analysis  of  Randomized-Select  on  page  216.) 


Chapter  notes 

Parallel  computers,  models  for  parallel  computers,  and  algorithmic  models  for  par¬ 
allel  programming  have  been  around  in  various  forms  for  years.  Prior  editions  of 
this  book  included  material  on  sorting  networks  and  the  PRAM  (Parallel  Random- 
Access  Machine)  model.  The  data-parallel  model  [48,  168]  is  another  popular  al¬ 
gorithmic  programming  model,  which  features  operations  on  vectors  and  matrices 
as  primitives. 
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Graham  [149]  and  Brent  [55]  showed  that  there  exist  schedulers  achieving  the 
bound  of  Theorem  27.1.  Eager,  Zahorjan,  and  Lazowska  [98]  showed  that  any 
greedy  scheduler  achieves  this  bound  and  proposed  the  methodology  of  using  work 
and  span  (although  not  by  those  names)  to  analyze  parallel  algorithms.  Blelloch 
[47]  developed  an  algorithmic  programming  model  based  on  work  and  span  (which 
he  called  the  “depth”  of  the  computation)  for  data-parallel  programming.  Blumofe 
and  Leiserson  [52]  gave  a  distributed  scheduling  algorithm  for  dynamic  multi¬ 
threading  based  on  randomized  “work-stealing”  and  showed  that  it  achieves  the 
bound  E  [7V]  <  T\/  P  +  0(T’oo).  Arora,  Blumofe,  and  Plaxton  [19]  and  Blelloch, 
Gibbons,  and  Matias  [49]  also  provided  provably  good  algorithms  for  scheduling 
dynamic  multithreaded  computations. 

The  multithreaded  pseudocode  and  programming  model  were  heavily  influenced 
by  the  Cilk  [51,  118]  project  at  MIT  and  the  Cilk++  [71]  extensions  to  C++  dis¬ 
tributed  by  Cilk  Arts,  Inc.  Many  of  the  multithreaded  algorithms  in  this  chapter 
appeared  in  unpublished  lecture  notes  by  C.  E.  Leiserson  and  H.  Prokop  and  have 
been  implemented  in  Cilk  or  Cilk++.  The  multithreaded  merge-sorting  algorithm 
was  inspired  by  an  algorithm  of  Akl  [12]. 

The  notion  of  sequential  consistency  is  due  to  Lamport  [223]. 
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Because  operations  on  matrices  lie  at  the  heart  of  scientific  computing,  efficient  al¬ 
gorithms  for  working  with  matrices  have  many  practical  applications.  This  chapter 
focuses  on  how  to  multiply  matrices  and  solve  sets  of  simultaneous  1  inear  equa¬ 
tions.  Appendix  D  reviews  the  basics  of  matrices. 

Section  28.1  shows  how  to  solve  a  set  of  linear  equations  using  LUP  decomposi¬ 
tions.  Then,  Section  28.2  explores  the  close  relationship  between  multiplying  and 
inverting  matrices.  Finally,  Section  28.3  discusses  the  important  class  of  symmetric 
positive-definite  matrices  and  shows  how  we  can  use  them  to  find  a  least-squares 
solution  to  an  overdetermined  set  of  linear  equations. 

One  important  issue  that  arises  in  practice  is  numerical  stability.  Due  to  the 
limited  precision  of  floating-point  representations  in  actual  computers,  round-off 
errors  in  numerical  computations  may  become  amplified  over  the  course  of  a  com¬ 
putation,  leading  to  incorrect  results;  we  call  such  computations  numerically  un¬ 
stable.  Although  we  shall  briefly  consider  numerical  stability  on  occasion,  we  do 
not  focus  on  it  in  this  chapter.  We  refer  you  to  the  excellent  book  by  Golub  and 
Van  Loan  [144]  for  a  thorough  discussion  of  stability  issues. 


28.1  Solving  systems  of  linear  equations 

Numerous  applications  need  to  solve  sets  of  simultaneous  linear  equations.  We 
can  formulate  a  linear  system  as  a  matrix  equation  in  which  each  matrix  or  vector 
element  belongs  to  a  field,  typically  the  real  numbers  M.  This  section  discusses  how 
to  solve  a  system  of  linear  equations  using  a  method  called  LUP  decomposition. 
We  start  with  a  set  of  linear  equations  in  n  unknowns  xi,x2, ...  ,xn\ 
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A  solution  to  the  equations  (28.1)  is  a  set  of  values  for  X\,  x2,  ■  ■  ■ ,  x„  that  satisfy 
all  of  the  equations  simultaneously.  In  this  section,  we  treat  only  the  case  in  which 
there  are  exactly  n  equations  in  n  unknowns. 

We  can  conveniently  rewrite  equations  (28.1)  as  the  matrix-vector  equation 


<2n  a  12 
<321  ^22 

<3«1  ®n2  '  ‘  '  Unn  J  \  *h  J  \  b n  J 

or,  equivalently,  letting  A  =  (a,7  ),  x  =  (x, ),  and  b  =  (h,  ),  as 


Ax  =  b  . 


(28.2) 


If  A  is  nonsingular,  it  possesses  an  inverse  A  1 ,  and 
x  =  A-lb  (28.3) 


is  the  solution  vector.  We  can  prove  that  x  is  the  unique  solution  to  equation  (28.2) 
as  follows.  If  there  are  two  solutions,  x  and  x',  then  Ax  =  Ax'  =  b  and,  letting  I 
denote  an  identity  matrix, 

x  =  lx 

=  (A~1A)x 
=  A-1(Ax) 

=  A-1  (Ax') 

=  (A~lA)x' 


In  this  section,  we  shall  be  concerned  predominantly  with  the  case  in  which  A 
is  nonsingular  or,  equivalently  (by  Theorem  D.l),  the  rank  of  A  is  equal  to  the 
number  n  of  unknowns.  There  are  other  possibilities,  however,  which  merit  a  brief 
discussion.  If  the  number  of  equations  is  less  than  the  number  n  of  unknowns— or, 
more  generally,  if  the  rank  of  A  is  less  than  n— then  the  system  is  underdeter¬ 
mined.  An  underdetermined  system  typically  has  infinitely  many  solutions,  al¬ 
though  it  may  have  no  solutions  at  all  if  the  equations  are  inconsistent.  If  the 
number  of  equations  exceeds  the  number  n  of  unknowns,  the  system  is  overdeter- 
mined,  and  there  may  not  exist  any  solutions.  Section  28.3  addresses  the  important 
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problem  of  finding  good  approximate  solutions  to  overdetermined  systems  of  linear 
equations. 

Let  us  return  to  our  problem  of  solving  the  system  Ax  =  b  of  n  equations  in  n 
unknowns.  We  could  compute  A~x  and  then,  using  equation  (28.3),  multiply  b 
by  A~x,  yielding  x  =  A~xb.  This  approach  suffers  in  practice  from  numerical 
instability.  Fortunately,  another  approach— LUP  decomposition— is  numerically 
stable  and  has  the  further  advantage  of  being  faster  in  practice. 

Overview  of  LUP  decomposition 

The  idea  behind  LUP  decomposition  is  to  find  three  n  x  n  matrices  L,  U,  and  P 
such  that 


PA  =  LU  , 


(28.4) 


where 

•  L  is  a  unit  lower-triangular  matrix, 

•  U  is  an  upper-triangular  matrix,  and 

•  P  is  a  permutation  matrix. 

We  call  matrices  L,  U ,  and  P  satisfying  equation  (28.4)  an  LUP  decomposition 
of  the  matrix  A.  We  shall  show  that  every  nonsingular  matrix  A  possesses  such  a 
decomposition. 

Computing  an  LUP  decomposition  for  the  matrix  A  has  the  advantage  that  we 
can  more  easily  solve  linear  systems  when  they  are  triangular,  as  is  the  case  for 
both  matrices  L  and  U.  Once  we  have  found  an  LUP  decomposition  for  A,  we 
can  solve  equation  (28.2),  Ax  =  b,  by  solving  only  triangular  linear  systems,  as 
follows.  Multiplying  both  sides  of  Ax  =  h  by  P  yields  the  equivalent  equation 
PAx  =  Pb,  which,  by  Exercise  D.l-4,  amounts  to  permuting  the  equations  (28.1). 
Using  our  decomposition  (28.4),  we  obtain 

LUx  =  Pb  . 

We  can  now  solve  this  equation  by  solving  two  triangular  linear  systems.  Let  us 
define  y  =  Ux,  where  x  is  the  desired  solution  vector.  First,  we  solve  the  lower- 
triangular  system 


Ly  =  Pb 


(28.5) 


for  the  unknown  vector  y  by  a  method  called  “forward  substitution.”  Having  solved 
for  y,  we  then  solve  the  upper-triangular  system 


Ux  =  y 


(28.6) 
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for  the  unknown  x  by  a  method  called  “back  substitution.”  Because  the  permu¬ 
tation  matrix  P  is  invertible  (Exercise  D.2-3),  multiplying  both  sides  of  equa¬ 
tion  (28.4)  by  P~l  gives  P~XPA  =  P~1LU,  so  that 

A  =  P~1LU  .  (28.7) 

Hence,  the  vector  x  is  our  solution  to  Ax  =  b: 

Ax  =  P~lLUx  (by  equation  (28.7)) 

=  P~lLy  (by  equation  (28.6)) 

=  P~1Pb  (by  equation  (28.5)) 

=  b. 

Our  next  step  is  to  show  how  forward  and  back  substitution  work  and  then  attack 
the  problem  of  computing  the  LUP  decomposition  itself. 

Forward  and  back  substitution 

Forward  substitution  can  solve  the  lower-triangular  system  (28.5)  in  &(n2)  time, 
given  L,  P,  and  b.  For  convenience,  we  represent  the  permutation  P  compactly 
by  an  array  n[\  . .  n].  For  i  =  1,2 ,...,«,  the  entry  n[i ]  indicates  that  Pi,n[i]  —  1 
and  Py  =  0  for  j  ^  n[i].  Thus,  PA  has  an[i]j  in  row  i  and  column  j ,  and  Pb 
has  bK[i]  as  its  /' th  element.  Since  L  is  unit  lower-triangular,  we  can  rewrite  equa¬ 
tion  (28.5)  as 

y i  =  b„[ i]  , 

h\y\  +  yi  =  bn[2] , 

^3 1  V 1  +  /32T2  +  J3  =  ^w[3]  > 

ini y  1  +  iniyi  +  +  •••  +  =  b ff[„] . 

The  first  equation  tells  us  that  yi  =  b„[ q.  Knowing  the  value  of  yq,  we  can 
substitute  it  into  the  second  equation,  yielding 

yi  —  bn[2\  —  I21V1  ■ 

Now,  we  can  substitute  both  Vi  and  V2  into  the  third  equation,  obtaining 
T3  =  b ji [3]  —  (^3 1 T 1  +  hiyi)  ■ 

In  general,  we  substitute  yi,  y2,  ■  ■  ■ ,  yt-\  “forward”  into  the  /th  equation  to  solve 
for  vt : 
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i-l 

yi  =  bm  -  hyj  ■ 

i= i 

Having  solved  for  y,  we  solve  for  x  in  equation  (28.6)  using  back  substitution , 
which  is  similar  to  forward  substitution.  Here,  we  solve  the  nth  equation  first  and 
work  backward  to  the  first  equation.  Like  forward  substitution,  this  process  runs 
in  @(n2)  time.  Since  U  is  upper-triangular,  we  can  rewrite  the  system  (28.6)  as 

Wll*l  +  U12X2  +  ■■■  +  U\'n-2Xn-2  +  Wi,n— l-Xn— 1  +  W1  n*n  =  Tl  > 

U22X2  +  ■■■  +  U2,n-2Xn-2  +  U2,n-\X„-l  +  U2nX„  =  J2  > 

Un~2,n—  2-^h—  2  “h  ^n— 2,n  —  \Xn  —  1  “1“ 

— l,/i  — 1-^n— 1  T  H n  —  l,nXn 

^n,n^n 

Thus,  we  can  solve  for  xn,xn-i , . . . ,  X\  successively  as  follows: 

%n  yn/Un,n  > 

Xfj  —  l  (T«— 1  ^n—\,n^n)l^n—\,n—\  -> 

Xn—2  —  (T«— 2  (Mn—  2,n—  l^n— 1  T  2,«— 2  » 

or,  in  general, 

*  =  (*.  -  E 

V  2=!  +  l 

Given  P,  L,  U,  and  b,  the  procedure  LUP-Solve  solves  for  x  by  combining 
forward  and  back  substitution.  The  pseudocode  assumes  that  the  dimension  n  ap¬ 
pears  in  the  attribute  L .  rows  and  that  the  permutation  matrix  P  is  represented  by 
the  array  n. 

LUP-Solve(L,  U,  n,  b ) 

1  n  =  L.rows 

2  let  x  be  a  new  vector  of  length  n 

3  for  i  =  1  to  n 

4  >7  =  bn\i]  —  Y.J=X  hj y j 

5  for  i  =  n  downto  1 

6  Xi  =  (y{  -  Y!j=i+iUijXj)  /uH 

7  return  x 


j  Ha 


=  y  n— 2  , 

=  yn- 1 , 
=  yn  ■ 
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Procedure  LUP-Solve  solves  for  y  using  forward  substitution  in  lines  3-4,  and 
then  it  solves  for  x  using  backward  substitution  in  lines  5-6.  Since  the  summation 
within  each  of  the  for  loops  includes  an  implicit  loop,  the  running  time  is  0(«2). 

As  an  example  of  these  methods,  consider  the  system  of  linear  equations  defined 
by 


/  1  2  0 
3  4  4 
\  5  6  3 


where 
A  = 

b  = 


and  we  wish  to  solve  for  the  unknown  x.  The  LUP  decomposition  is 


L 

U 

P 


5  6  3 

0  0.8  -0.6 
0  0  2.5 


(You  might  want  to  verify  that  PA 
Ly  =  Pb  for  y: 


/  1  0  0 
0.2  10 
y  0.6  0.5  1 


y  i 
>’2 
V3 


obtaining 


LU .)  Using  forward  substitution,  we  solve 


by  computing  first  ylt  then  y2,  and  finally  y3.  Using  back  substitution,  we  solve 
Ux  =  y  for  x: 
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thereby  obtaining  the  desired  answer 


x 


by  computing  first  x3,  then  x2,  and  finally  X\. 


Computing  an  LU  decomposition 


We  have  now  shown  that  if  we  can  create  an  LUP  decomposition  for  a  nonsingular 
matrix  A,  then  forward  and  back  substitution  can  solve  the  system  Ax  =  b  of 
linear  equations.  Now  we  show  how  to  efficiently  compute  an  LUP  decomposition 
for  A.  We  start  with  the  case  in  which  A  is  an  n  x  n  nonsingular  matrix  and  P  is 
absent  (or,  equivalently,  P  =  !„).  In  this  case,  we  factor  A  =  LU .  We  call  the 
two  matrices  L  and  U  an  LU  decomposition  of  A. 

We  use  a  process  known  as  Gaussian  elimination  to  create  an  LU  decomposi¬ 
tion.  We  start  by  subtracting  multiples  of  the  first  equation  from  the  other  equations 
in  order  to  remove  the  first  variable  from  those  equations.  Then,  we  subtract  mul¬ 
tiples  of  the  second  equation  from  the  third  and  subsequent  equations  so  that  now 
the  first  and  second  variables  are  removed  from  them.  We  continue  this  process 
until  the  system  that  remains  has  an  upper- triangular  form— in  fact,  it  is  the  ma¬ 
trix  U .  The  matrix  L  is  made  up  of  the  row  multipliers  that  cause  variables  to  be 
eliminated. 

Our  algorithm  to  implement  this  strategy  is  recursive.  We  wish  to  construct  an 
LU  decomposition  for  an  n  x  n  nonsingular  matrix  A.  If  n  =  1,  then  we  are  done, 
since  we  can  choose  L  =  I\  and  U  =  A.  For  n  >  1,  we  break  A  into  four  parts: 


where  v  is  a  column  (/;  —  l)-vector,  wT  is  a  row  (n  —  l)-vector,  and  A!  is  an 
(n  —  1)  x  (n  —  1)  matrix.  Then,  using  matrix  algebra  (verify  the  equations  by 
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simply  multiplying  through),  we  can  factor  A  as 


(28.8) 


The  Os  in  the  first  and  second  matrices  of  equation  (28.8)  are  row  and  col¬ 
umn  (n  —  l)-vectors,  respectively.  The  term  vwT/an,  formed  by  taking  the 
outer  product  of  v  and  w  and  dividing  each  element  of  the  result  by  an,  is  an 
(n  —  1)  x  (n  —  1)  matrix,  which  conforms  in  size  to  the  matrix  A'  from  which  it  is 
subtracted.  The  resulting  (n  —  1)  x  (/?  —  I )  matrix 


A!  —  vw1  /an 


(28.9) 


is  called  the  Schur  complement  of  A  with  respect  to  a n. 

We  claim  that  if  A  is  nonsingular,  then  the  Schur  complement  is  nonsingular, 
too.  Why?  Suppose  that  the  Schur  complement,  which  is  (n  —  1)  x  (n  —  1),  is 
singular.  Then  by  Theorem  D.l,  it  has  row  rank  strictly  less  than  n  —  1.  Because 
the  bottom  72  —  1  entries  in  the  first  column  of  the  matrix 

(  an  wT  \ 

0  A'  —  vuA / a 1 1  J 

are  all  0,  the  bottom  n  —  1  rows  of  this  matrix  must  have  row  rank  strictly  less 
than  72  —  1.  The  row  rank  of  the  entire  matrix,  therefore,  is  strictly  less  than  n. 
Applying  Exercise  D.2-8  to  equation  (28.8),  A  has  rank  strictly  less  than  72,  and 
from  Theorem  D.l  we  derive  the  contradiction  that  A  is  singular. 

Because  the  Schur  complement  is  nonsingular,  we  can  now  recursively  find  an 
LU  decomposition  for  it.  Let  us  say  that 

A!  —  vujt /an  =  L'U'  , 


where  L'  is  unit  lower-triangular  and  U'  is  upper-triangular.  Then,  using  matrix 
algebra,  we  have 


A 


f  1  0  t27T 

\v/an  /„-i  7  V  0  A'-vwT/an 
[  1  0  an  wT  \ 

\  v/an  In—i  J{  0  L'U'J 

(  1  0  an  wT\ 

\v/an  V  )\  0  U'  ) 

LU  , 


thereby  providing  our  LU  decomposition.  (Note  that  because  L '  is  unit  lower- 
triangular,  so  is  L,  and  because  U'  is  upper-triangular,  so  is  U .) 
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Of  course,  if  an  =0,  this  method  doesn’t  work,  because  it  divides  by  0.  It  also 
doesn’t  work  if  the  upper  leftmost  entry  of  the  Schur  complement  A'  —  vwT/an 
is  0,  since  we  divide  by  it  in  the  next  step  of  the  recursion.  The  elements  by 
which  we  divide  during  LU  decomposition  are  called  pivots,  and  they  occupy  the 
diagonal  elements  of  the  matrix  U .  The  reason  we  include  a  permutation  matrix  P 
during  LUP  decomposition  is  that  it  allows  us  to  avoid  dividing  by  0.  When  we  use 
permutations  to  avoid  division  by  0  (or  by  small  numbers,  which  would  contribute 
to  numerical  instability),  we  are  pivoting. 

An  important  class  of  matrices  for  which  LU  decomposition  always  works  cor¬ 
rectly  is  the  class  of  symmetric  positive-definite  matrices.  Such  matrices  require 
no  pivoting,  and  thus  we  can  employ  the  recursive  strategy  outlined  above  with¬ 
out  fear  of  dividing  by  0.  We  shall  prove  this  result,  as  well  as  several  others,  in 
Section  28.3. 

Our  code  for  LU  decomposition  of  a  matrix  A  follows  the  recursive  strategy,  ex¬ 
cept  that  an  iteration  loop  replaces  the  recursion.  (This  transformation  is  a  standard 
optimization  for  a  “tail-recursive”  procedure— one  whose  last  operation  is  a  recur¬ 
sive  call  to  itself.  See  Problem  7-4.)  It  assumes  that  the  attribute  A. rows  gives 
the  dimension  of  A.  We  initialize  the  matrix  U  with  0s  below  the  diagonal  and 
matrix  L  with  Is  on  its  diagonal  and  0s  above  the  diagonal. 

LU-Decomposition(A) 

1  n  =  A. rows 

2  let  L  and  U  be  new  n  x  n  matrices 

3  initialize  U  with  0s  below  the  diagonal 

4  initialize  L  with  Is  on  the  diagonal  and  0s  above  the  diagonal 

5  for  k  =  1  to  n 

6  Ukk  —  dkk 

7  for  i  =  k  +  1  to  n 

8  Uk  =  atk/ukk  //  Uk  holds  Vj 

9  Uki  =  flfe;  //  Uki  holds  wj 

10  for  i  =  k  +  1  to  n 

1 1  for  j  =  k  +  1  to  / 1 

12  Cljj  —  Cl jj  lj k  77 kj 

13  return  L  and  U 

The  outer  for  loop  beginning  in  line  5  iterates  once  for  each  recursive  step.  Within 
this  loop,  line  6  determines  the  pivot  to  be  iikk  =  cikk-  The  for  loop  in  lines  7-9 
(which  does  not  execute  when  k  =  n),  uses  the  v  and  wT  vectors  to  update  L 
and  U.  Line  8  determines  the  elements  of  the  v  vector,  storing  v,  in  /,  * ,  and  line  9 
computes  the  elements  of  the  wT  vector,  storing  wj  in  Ugi-  Finally,  lines  10-12 
compute  the  elements  of  the  Schur  complement  and  store  them  back  into  the  ma- 
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Figure  28.1  The  operation  of  LU  DECOMPOSITION,  (a)  The  matrix  A.  (b)  The  element  a n  =  2 
in  the  black  circle  is  the  pivot,  the  shaded  column  is  v/a\\,  and  the  shaded  row  is  toT.  The  elements 
of  U  computed  thus  far  are  above  the  horizontal  line,  and  the  elements  of  L  are  to  the  left  of  the 
vertical  line.  The  Schur  complement  matrix  A'  —  vvA /a\\  occupies  the  lower  right,  (c)  We  now 
operate  on  the  Schur  complement  matrix  produced  from  part  (b).  The  element  022  =  4  in  the  black 
circle  is  the  pivot,  and  the  shaded  column  and  row  are  v/fl22  and  u>T  (in  the  partitioning  of  the  Schur 
complement),  respectively.  Lines  divide  the  matrix  into  the  elements  of  U  computed  so  far  (above), 
the  elements  of  L  computed  so  far  (left),  and  the  new  Schur  complement  (lower  right),  (d)  After  the 
next  step,  the  matrix  A  is  factored.  (The  element  3  in  the  new  Schur  complement  becomes  part  of  U 
when  the  recursion  terminates.)  (e)  The  factorization  A  =  LU . 

trix  A.  (We  don’t  need  to  divide  by  app  in  line  12  because  we  already  did  so  when 
we  computed  ltp  in  line  8.)  Because  line  12  is  triply  nested,  LU-DECOMPOSITION 
runs  in  time  @(n3). 

Figure  28.1  illustrates  the  operation  of  LU-DECOMPOSITION.  It  shows  a  stan¬ 
dard  optimization  of  the  procedure  in  which  we  store  the  significant  elements  of  L 
and  U  in  place  in  the  matrix  A.  That  is,  we  can  set  up  a  correspondence  between 
each  element  a,j  and  either  /;/  (if  i  >  j)  or  Ujj  (if  i  <  j)  and  update  the  ma¬ 
trix  A  so  that  it  holds  both  L  and  U  when  the  procedure  terminates.  To  obtain 
the  pseudocode  for  this  optimization  from  the  above  pseudocode,  just  replace  each 
reference  to  l  or  u  by  a;  you  can  easily  verify  that  this  transformation  preserves 
correctness. 


Computing  an  LUP  decomposition 

Generally,  in  solving  a  system  of  linear  equations  Ax  =  b,  we  must  pivot  on  off- 
diagonal  elements  of  A  to  avoid  dividing  by  0.  Dividing  by  0  would,  of  course, 
be  disastrous.  But  we  also  want  to  avoid  dividing  by  a  small  value— even  if  A  is 
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nonsingular— because  numerical  instabilities  can  result.  We  therefore  try  to  pivot 
on  a  large  value. 

The  mathematics  behind  LUP  decomposition  is  similar-  to  that  of  LU  decom¬ 
position.  Recall  that  we  are  given  an  n  x  n  nonsingular  matrix  A,  and  we  wish 
to  find  a  permutation  matrix  P,  a  unit  lower-triangular-  matrix  L,  and  an  upper- 
triangular-  matrix  U  such  that  PA  —  LU.  Before  we  partition  the  matrix  A,  as  we 
did  for  LU  decomposition,  we  move  a  nonzero  element,  say  akl,  from  somewhere 
in  the  first  column  to  the  (1, 1)  position  of  the  matrix.  For  numerical  stability,  we 
choose  ak i  as  the  element  in  the  first  column  with  the  greatest  absolute  value.  (The 
first  column  cannot  contain  only  Os,  for  then  A  would  be  singular,  because  its  de¬ 
terminant  would  be  0,  by  Theorems  D.4  and  D.5.)  In  order  to  preserve  the  set  of 
equations,  we  exchange  row  1  with  row  k,  which  is  equivalent  to  multiplying  A  by 
a  permutation  matrix  Q  on  the  left  (Exercise  D.l-4).  Thus,  we  can  write  QA  as 


QA  = 


where  v  =  (a2i,a3i, . . .  ,anl)T,  except  that  an  replaces  akt ;  wT  =  (ak2,ak3, 
. . . ,  akn)\  and  A!  is  an  (n  —  1)  x  (n  —  1)  matrix.  Since  ak\  ^  0,  we  can  now  perform 
much  the  same  linear  algebra  as  for  LU  decomposition,  but  now  guaranteeing  that 
we  do  not  divide  by  0: 


As  we  saw  for  LU  decomposition,  if  A  is  nonsingular,  then  the  Schur  comple¬ 
ment  A!  —  vwT /aki  is  nonsingular,  too.  Therefore,  we  can  recursively  find  an 
LUP  decomposition  for  it,  with  unit  lower-triangular  matrix  L' ,  upper-triangular 
matrix  U',  and  permutation  matrix  P' ,  such  that 


P\A'  -  vwT/akl)  =  L'U'  . 


Define 


which  is  a  permutation  matrix,  since  it  is  the  product  of  two  permutation  matrices 
(Exercise  D.l-4).  We  now  have 
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f  1  o  1  o  a ki  WT 
VO  P'  )  V  v/akl  7„_i  )  V  0  A!  -  vwT/aki 

~  (  P'v/ak i 

=  (  1 

V  P'v/ak i 

~~  (  P'v/ak i 

=  f  1 

V  P'v/ak\ 

=  LU  , 

yielding  the  LUP  decomposition.  Because  L'  is  unit  lower-triangular,  so  is  L,  and 
because  U'  is  upper-triangular,  so  is  U . 

Notice  that  in  this  derivation,  unlike  the  one  for  LU  decomposition,  we  must 
multiply  both  the  column  vector  v/ak\  and  the  Schur  complement  A'  —  vwT /ak\ 
by  the  permutation  matrix  P' .  Here  is  the  pseudocode  for  LUP  decomposition: 

LUP-Decomposition  (A) 

1  n  =  A.  rows 

2  let  n  [1  . .  n ]  be  a  new  array 

3  for  i  =  1  to  n 

4  Tt[i]  =  i 

5  for  k  —  1  to  n 

6  p  =  0 

7  for  i  —  k  to  n 

8  if  \aik\  >  p 

9  p  =  \aik  | 

10  k'  =  i 

11  if  p  ==  0 

12  error  '‘singular  matrix” 

13  exchange  n[k]  with  n[k'\ 

14  for  i  —  1  to  n 

15  exchange  aki  with  ak>i 

16  for  i  =  k  +  1  to  n 

12  CLjk  —  (lik/dkk 

18  for  j  =  k  +  1  to  n 

19  a-ij  —  a  ij  aikakj 


0  \(  akl  wT  \ 

P'  yy  9  A'  —  vwT/aki  J 

0  \(  ak i  wT 

ln—\  /  V  0  P'(A'  -vwT/akl) 

0  \faki  wT  \ 

In -i  J{  0  L'U'  ) 

0  \  (  akl  wT\ 

L'  )\  0  U'  ) 


28.1  Solving  systems  of  linear  equations 


825 


Like  LU-Decomposition,  our  LUP-Decomposition  procedure  replaces 
the  recursion  with  an  iteration  loop.  As  an  improvement  over  a  direct  implemen¬ 
tation  of  the  recursion,  we  dynamically  maintain  the  permutation  matrix  P  as  an 
array  n,  where  n[i]  =  j  means  that  the  ith  row  of  P  contains  a  1  in  column  j . 
We  also  implement  the  code  to  compute  L  and  U  “in  place”  in  the  matrix  A.  Thus, 
when  the  procedure  terminates, 

(  Uj  if  i  >  j  , 

ciij  =  < 

(  Uij  if  i  <  j  . 

Figure  28.2  illustrates  how  LUP-DECOMPOSITION  factors  a  matrix.  Lines  3-4 
initialize  the  array  n  to  represent  the  identity  permutation.  The  outer  for  loop 
beginning  in  line  5  implements  the  recursion.  Each  time  through  the  outer  loop, 
lines  6-10  determine  the  element  dk'k  with  largest  absolute  value  of  those  in  the 
current  first  column  (column  k)  of  the  (n  —  k  +  1)  x  {n  —  k  +  1)  matrix  whose 
LUP  decomposition  we  are  finding.  If  all  elements  in  the  current  first  column  are 
zero,  lines  11-12  report  that  the  matrix  is  singular.  To  pivot,  we  exchange  n[k'] 
with  n[k\  in  line  13  and  exchange  the  k th  and  /r'th  rows  of  A  in  lines  14-15, 
thereby  making  the  pivot  element  dgk-  (The  entire  rows  are  swapped  because  in 
the  derivation  of  the  method  above,  not  only  is  A'  —  vwT /dk\  multiplied  by  P1,  but 
so  is  v/dki-)  Finally,  the  Schur  complement  is  computed  by  lines  16-19  in  much 
the  same  way  as  it  is  computed  by  lines  7-12  of  LU-DECOMPOSITION,  except  that 
here  the  operation  is  written  to  work  in  place. 

Because  of  its  triply  nested  loop  structure,  LUP-DECOMPOSITION  has  a  run¬ 
ning  time  of  0(/r3),  which  is  the  same  as  that  of  LU-DECOMPOSITION.  Thus, 
pivoting  costs  us  at  most  a  constant  factor  in  time. 

Exercises 

28.1-1 

Solve  the  equation 


by  using  forward  substitution. 


28.1-2 


Find  an  LU  decomposition  of  the  matrix 
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Figure  28.2  The  operation  of  LUP  DECOMPOSITION,  (a)  The  input  matrix  A  with  the  identity 
permutation  of  the  rows  on  the  left.  The  first  step  of  the  algorithm  determines  that  the  element  5 
in  the  black  circle  in  the  third  row  is  the  pivot  for  the  first  column,  (b)  Rows  1  and  3  are  swapped 
and  the  permutation  is  updated.  The  shaded  column  and  row  represent  v  and  wT.  (c)  The  vector  v 
is  replaced  by  v/5,  and  the  lower  right  of  the  matrix  is  updated  with  the  Schur  complement.  Lines 
divide  the  matrix  into  three  regions:  elements  of  U  (above),  elements  of  L  (left),  and  elements  of  the 
Schur  complement  (lower  right),  (d)  (f)  The  second  step,  (g)  (i)  The  third  step.  No  further  changes 
occur  on  the  fourth  (final)  step,  (j)  The  LUP  decomposition  PA  =  LU . 
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28.1-3 

Solve  the  equation 

1  5  4\/xi\  /  12 

2  0  3  )  *2  )  =  9 

5  8  2  J\x3J  V  5 

by  using  an  LUP  decomposition. 


28.1-4 

Describe  the  LUP  decomposition  of  a  diagonal  matrix. 


28.1-5 

Describe  the  LUP  decomposition  of  a  permutation  matrix  A,  and  prove  that  it  is 
unique. 


28.1-6 

Show  that  for  all  n  >  1,  there  exists  a  singular  n  x  n  matrix  that  has  an  LU  decom¬ 
position. 


28.1-7 

In  LU-DECOMPOSITION,  is  it  necessary  to  perform  the  outermost  for  loop  itera¬ 
tion  when  k  =  nl  How  about  in  LUP-Decomposition? 


28.2  Inverting  matrices 

Although  in  practice  we  do  not  generally  use  matrix  inverses  to  solve  systems  of 
linear  equations,  preferring  instead  to  use  more  numerically  stable  techniques  such 
as  LUP  decomposition,  sometimes  we  need  to  compute  a  matrix  inverse.  In  this 
section,  we  show  how  to  use  LUP  decomposition  to  compute  a  matrix  inverse. 
We  also  prove  that  matrix  multiplication  and  computing  the  inverse  of  a  matrix 
are  equivalently  hard  problems,  in  that  (subject  to  technical  conditions)  we  can 
use  an  algorithm  for  one  to  solve  the  other  in  the  same  asymptotic  running  time. 
Thus,  we  can  use  Strassen’s  algorithm  (see  Section  4.2)  for  matrix  multiplication 
to  invert  a  matrix.  Indeed,  Strassen’s  original  paper  was  motivated  by  the  problem 
of  showing  that  a  set  of  a  linear  equations  could  be  solved  more  quickly  than  by 
the  usual  method. 
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Computing  a  matrix  inverse  from  an  LUP  decomposition 

Suppose  that  we  have  an  LUP  decomposition  of  a  matrix  A  in  the  form  of  three 
matrices  L,  U,  and  P  such  that  PA  =  LU .  Using  LUP-Solve,  we  can  solve 
an  equation  of  the  form  Ax  =  b  in  time  0(«2).  Since  the  LUP  decomposition 
depends  on  A  but  not  b,  we  can  run  LUP-Solve  on  a  second  set  of  equations  of 
the  form  Ax  =  b'  in  additional  time  0(«2).  In  general,  once  we  have  the  LUP 
decomposition  of  A,  we  can  solve,  in  time  Q(kn2),  k  versions  of  the  equation 
^4x  =  b  that  differ  only  in  b. 

We  can  think  of  the  equation 

AX  =  In  ,  (28.10) 

which  defines  the  matrix  X,  the  inverse  of  A,  as  a  set  of  n  distinct  equations  of  the 
form  Ax  =  b.  To  be  precise,  let  X,  denote  the  / th  column  of  X,  and  recall  that  the 
unit  vector  et  is  the  zth  column  of  /„.  We  can  then  solve  equation  (28.10)  for  X  by 
using  the  LUP  decomposition  for  A  to  solve  each  equation 

AXi  =  e, 

separately  for  Xj .  Once  we  have  the  LUP  decomposition,  we  can  compute  each  of 
the  n  columns  X,  in  time  0(«2),  and  so  we  can  compute  X  from  the  LUP  decom¬ 
position  of  A  in  time  0(/r3).  Since  we  can  determine  the  LUP  decomposition  of  A 
in  time  0(»3),  we  can  compute  the  inverse  A~x  of  a  matrix  A  in  time  0(«3). 

Matrix  multiplication  and  matrix  inversion 

We  now  show  that  the  theoretical  speedups  obtained  for  matrix  multiplication 
translate  to  speedups  for  matrix  inversion.  In  fact,  we  prove  something  stronger: 
matrix  inversion  is  equivalent  to  matrix  multiplication,  in  the  following  sense. 
If  M(n)  denotes  the  time  to  multiply  two  n  x  n  matrices,  then  we  can  invert  a 
nonsingular  n  x  n  matrix  in  time  0(M(n)).  Moreover,  if  I(n)  denotes  the  time 
to  invert  a  nonsingular  n  x  n  matrix,  then  we  can  multiply  two  n  x  n  matrices  in 
time  0(I(n)).  We  prove  these  results  as  two  separate  theorems. 

Theorem  28.1  (Multiplication  is  no  harder  than  inversion ) 

If  we  can  invert  an  n  x  n  matrix  in  time  I(n),  where  I(n)  =  Q(n2)  and  /(/?) 
satisfies  the  regularity  condition  I  On)  =  0(1  («)),  then  we  can  multiply  two  n  x  n 
matrices  in  time  0(I(n)). 

Proof  Let  A  and  B  be  n  x  n  matrices  whose  matrix  product  C  we  wish  to  com¬ 
pute.  We  define  the  3n  x  3n  matrix  D  by 
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(In  A  0  \ 

D  =  0  In  B  . 

V  0  0  /„  / 

The  inverse  of  D  is 

//„  -A  AB  \ 

Z)-1  =  0  /„  -B  , 

V  0  0  /„  / 

and  thus  we  can  compute  the  product  AB  by  taking  the  upper  right  n  x  n  submatrix 
of  D~\ 

We  can  construct  matrix  D  in  @(«2)  time,  which  is  0(1  (n))  because  we  assume 
that  I(n)  =  £2(«2),  and  we  can  invert  D  in  0(I(3n ))  =  £?(/(«))  time,  by  the 
regularity  condition  on  /(/?).  We  thus  have  M(n)  =  0(I(n)).  m 

Note  that  /(«)  satisfies  the  regularity  condition  whenever  I(n)  =  ®(nc  lgd  n) 
for  any  constants  c  >  0  and  d  >  0. 

The  proof  that  matrix  inversion  is  no  harder  than  matrix  multiplication  relies 
on  some  properties  of  symmetric  positive-definite  matrices  that  we  will  prove  in 
Section  28.3. 

Theorem  28.2  ( Inversion  is  no  harder  than  multiplication ) 

Suppose  we  can  multiply  two  n  x  n  real  matrices  in  time  M{n),  where  M(n)  = 
Q(n2)  and  M(n )  satisfies  the  two  regularity  conditions  M(n  +  k)  =  0(M(n ))  for 
any  k  in  the  range  0  <  k  <  n  and  M(n/ 2)  <  cM(n)  for  some  constant  c  <  1/2. 
Then  we  can  compute  the  inverse  of  any  real  nonsingular  n  x  n  matrix  in  time 
0(M(n)). 

Proof  We  prove  the  theorem  here  for  real  matrices.  Exercise  28.2-6  asks  you  to 
generalize  the  proof  for  matrices  whose  entries  are  complex  numbers. 

We  can  assume  that  n  is  an  exact  power  of  2,  since  we  have 


for  any  k  >  0.  Thus,  by  choosing  k  such  that  n  +  k  is  a  power  of  2,  we  enlarge 
the  matrix  to  a  size  that  is  the  next  power  of  2  and  obtain  the  desired  answer  A~l 
from  the  answer  to  the  enlarged  problem.  The  first  regularity  condition  on  M(n) 
ensures  that  this  enlargement  does  not  cause  the  running  time  to  increase  by  more 
than  a  constant  factor. 

For  the  moment,  let  us  assume  that  the  nxn  matrix  A  is  symmetric  and  positive- 
definite.  We  partition  each  of  A  and  its  inverse  A~'  into  four  n/2xn/2  submatri¬ 


ces: 
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A  =  (c  d)  md  A~'  =  ( 

Then,  if  we  let 

S  =  D  —  CB~1Cr  (28.12) 

be  the  Schur  complement  of  A  with  respect  to  B  (we  shall  see  more  about  this  form 
of  Schur  complement  in  Section  28.3),  we  have 

vH 

since  A  A  1  =  /„,  as  you  can  verify  by  performing  the  matrix  multiplication.  Be¬ 
cause  A  is  symmetric  and  positive-definite,  Lemmas  28.4  and  28.5  in  Section  28.3 
imply  that  B  and  S  are  both  symmetric  and  positive-definite.  By  Lemma  28.3  in 
Section  28.3,  therefore,  the  inverses  B~'  and  S-1  exist,  and  by  Exercise  D.2-6, 
B~l  and  5_1  are  symmetric,  so  that  (5_1)T  =  B and  (S_1)T  =  S'-1.  There¬ 
fore,  we  can  compute  the  submatrices  R,  T,  U ,  and  V  of  A~l  as  follows,  where 
all  matrices  mentioned  are  n/2  x  n/2: 

1.  Form  the  submatrices  B,  C ,  CT,  and  D  of  A. 

2.  Recursively  compute  the  inverse  B~l  of  B. 

3.  Compute  the  matrix  product  W  =  CB -1,  and  then  compute  its  transpose  IT1, 
which  equals  B~1CT  (by  Exercise  D.l-2  and  (5-1)T  =  B~1). 

4.  Compute  the  matrix  product  X  =  WCT,  which  equals  CB  '  CT,  and  then 
compute  the  matrix  S  —  D  —  X  —  D  —  CB^1CT. 

5.  Recursively  compute  the  inverse  S-1  of  S,  and  set  V  to  S'-1. 

6.  Compute  the  matrix  product  Y  =  S'-1  IE,  which  equals  S~lCB~l,  and 
then  compute  its  transpose  FT,  which  equals  B~1CTS~ 1  (by  Exercise  D.l-2, 
(5-‘)t  =  B-1,  and  (S’-1)1  =  5-1).  Set  T  to -YT  and  U  to  —Y. 

7.  Compute  the  matrix  product  Z  =  WTY,  which  equals  B  1  C  '  S~'  C B  ~' ,  and 
set  R  to  B~l  +  Z. 

Thus,  we  can  invert  an  n  x  n  symmetric  positive-definite  matrix  by  inverting  two 
n/2  x  n/2  matrices  in  steps  2  and  5;  performing  four  multiplications  of  n/2  x  n/2 
matrices  in  steps  3,  4,  6,  and  7;  plus  an  additional  cost  of  0(n2)  for  extracting 
submatrices  from  A,  inserting  submatrices  into  /R1 ,  and  performing  a  constant 
number  of  additions,  subtractions,  and  transposes  on  n/2  x  n/2  matrices.  We  get 
the  recurrence 

7(n)  <  21  (n/2)  +  4M(n/2)  +  0(n2) 

=  2/(n/2)  +  0(M(n)) 

=  0(M(n))  . 


-1  +  B~1CTS~1CB-1  -B-1CTS~ 


T 

V 


(28.11) 
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The  second  line  holds  because  the  second  regularity  condition  in  the  statement 
of  the  theorem  implies  that  4M(n/2)  <  2 M(n)  and  because  we  assume  that 
A/ (ft)  =  Q(ii2).  The  third  line  follows  because  the  second  regularity  condition 
allows  us  to  apply  case  3  of  the  master  theorem  (Theorem  4.1). 

It  remains  to  prove  that  we  can  obtain  the  same  asymptotic  running  time  for  ma¬ 
trix  multiplication  as  for  matrix  inversion  when  A  is  invertible  but  not  symmetric 
and  positive-definite.  The  basic  idea  is  that  for  any  nonsingular  matrix  A,  the  ma¬ 
trix  ATA  is  symmetric  (by  Exercise  D.l-2)  and  positive-definite  (by  Theorem  D.6). 
The  trick,  then,  is  to  reduce  the  problem  of  inverting  A  to  the  problem  of  invert¬ 
ing  ATA. 

The  reduction  is  based  on  the  observation  that  when  A  is  an  n  x  n  nonsingular 
matrix,  we  have 

A-1  =  (AtA)~1At  , 

since  {(A1  Ay1  A1)  A  =  {A1  Ay1  (A1  A)  =  ln  and  a  matrix  inverse  is  unique. 
Therefore,  we  can  compute  A~l  by  first  multiplying  AT  by  A  to  obtain  ATA,  then 
inverting  the  symmetric  positive-definite  matrix  AT  A  using  the  above  divide-and- 
conquer  algorithm,  and  finally  multiplying  the  result  by  A  '  .  Each  of  these  three 
steps  takes  0(M(n ))  time,  and  thus  we  can  invert  any  nonsingular  matrix  with  real 
entries  in  0(M(n))  time.  ■ 

The  proof  of  Theorem  28.2  suggests  a  means  of  solving  the  equation  Ax  =  b 
by  using  LU  decomposition  without  pivoting,  so  long  as  A  is  nonsingular.  We 
multiply  both  sides  of  the  equation  by  AT,  yielding  (A  '  A)x  =  ATb.  This  trans¬ 
formation  doesn’t  affect  the  solution  x,  since  AT  is  invertible,  and  so  we  can  fac¬ 
tor  the  symmetric  positive-definite  matrix  A  '  A  by  computing  an  LU  decomposi¬ 
tion.  We  then  use  forward  and  back  substitution  to  solve  for  x  with  the  right-hand 
side  ATb.  Although  this  method  is  theoretically  correct,  in  practice  the  procedure 
LUP-DECOMPOSITION  works  much  better.  LUP  decomposition  requires  fewer 
arithmetic  operations  by  a  constant  factor,  and  it  has  somewhat  better  numerical 
properties. 

Exercises 


28.2-1 

Let  M(n)  be  the  time  to  multiply  two  n  x  n  matrices,  and  let  .S' (ft)  denote  the  time 
required  to  square  an  n  x  n  matrix.  Show  that  multiplying  and  squaring  matri¬ 
ces  have  essentially  the  same  difficulty:  an  M(ft)-time  matrix-multiplication  al¬ 
gorithm  implies  an  0(M(n))A\mc  squaring  algorithm,  and  an  S(«)-time  squaring 
algorithm  implies  an  <9(S(ft))-time  matrix-multiplication  algorithm. 
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28.2-2 

Let  M(n )  be  the  time  to  multiply  two  n  x  n  matrices,  and  let  L(n)  be  the  time  to 
compute  the  LUP  decomposition  of  an  n  x  n  matrix.  Show  that  multiplying  matri¬ 
ces  and  computing  LUP  decompositions  of  matrices  have  essentially  the  same  dif¬ 
ficulty:  an  Af(/j)-time  matrix-multiplication  algorithm  implies  an  0(M(n))- time 
LUP-decomposition  algorithm,  and  an  L(«)-time  LUP-decomposition  algorithm 
implies  an  0(L(n))-time  matrix-multiplication  algorithm. 


28.2-3 

Let  M(n)  be  the  time  to  multiply  two  n  x  n  matrices,  and  let  D(n)  denote  the 
time  required  to  find  the  determinant  of  an  n  x  n  matrix.  Show  that  multiply¬ 
ing  matrices  and  computing  the  determinant  have  essentially  the  same  difficulty: 
an  M(«)-time  matrix-multiplication  algorithm  implies  an  0(M(n))- time  determi¬ 
nant  algorithm,  and  a  D(«)-time  determinant  algorithm  implies  an  0(  D(n))-time 
matrix-multiplication  algorithm. 


28.2-4 

Let  M(n)  be  the  time  to  multiply  two  n  x  n  boolean  matrices,  and  let  T(n)  be  the 
time  to  find  the  transitive  closure  of  an  n  x  n  boolean  matrix.  (See  Section  25.2.) 
Show  that  an  M(n  )-time  boolean  matrix-multiplication  algorithm  implies  an 
0(M(n)  lg  «)-time  transitive-closure  algorithm,  and  a  T (n)-time  transitive-closure 
algorithm  implies  an  0(7’(«))-time  boolean  matrix-multiplication  algorithm. 


28.2- 5 

Does  the  matrix-inversion  algorithm  based  on  Theorem  28.2  work  when  matrix 
elements  are  drawn  from  the  field  of  integers  modulo  2?  Explain. 

28.2- 6  * 

Generalize  the  matrix-inversion  algorithm  of  Theorem  28.2  to  handle  matrices  of 
complex  numbers,  and  prove  that  your  generalization  works  correctly.  (Hint:  In¬ 
stead  of  the  transpose  of  A,  use  the  conjugate  transpose  A*,  which  you  obtain  from 
the  transpose  of  A  by  replacing  every  entry  with  its  complex  conjugate.  Instead  of 
symmetric  matrices,  consider  Hermitian  matrices,  which  are  matrices  A  such  that 
A  =  A*.) 


28.3  Symmetric  positive-definite  matrices  and  least-squares  approximation 

Symmetric  positive -definite  matrices  have  many  interesting  and  desirable  proper¬ 
ties.  For  example,  they  are  nonsingular,  and  we  can  perform  LU  decomposition 
on  them  without  having  to  worry  about  dividing  by  0.  In  this  section,  we  shall 
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prove  several  other  important  properties  of  symmetric  positive-definite  matrices 
and  show  an  interesting  application  to  curve  fitting  by  a  least-squares  approxima¬ 
tion. 

The  first  property  we  prove  is  perhaps  the  most  basic. 

Lemma  28.3 

Any  positive-definite  matrix  is  nonsingular. 

Proof  Suppose  that  a  matrix  A  is  singular.  Then  by  Corollary  D.3,  there  exists  a 
nonzero  vector  x  such  that  Ax  =  0.  Hence,  x'  Ax  =  0,  and  A  cannot  be  positive- 
definite.  ■ 

The  proof  that  we  can  perform  LU  decomposition  on  a  symmetric  positive- 
definite  matrix  A  without  dividing  by  0  is  more  involved.  We  begin  by  proving 
properties  about  certain  submatrices  of  A.  Define  the  k th  leading  submatrix  of  A 
to  be  the  matrix  A k  consisting  of  the  intersection  of  the  first  k  rows  and  first  k 
columns  of  A. 

Lemma  28.4 

If  A  is  a  symmetric  positive-definite  matrix,  then  every  leading  submatrix  of  A  is 
symmetric  and  positive-definite. 

Proof  That  each  leading  submatrix  Ak  is  symmetric  is  obvious.  To  prove  that  Ak 
is  positive-definite,  we  assume  that  it  is  not  and  derive  a  contradiction.  If  Ak  is  not 
positive-definite,  then  there  exists  a  /^-vector  Xk  f  0  such  that  x'k  A k Xk  <  0.  Let  A 
be  n  x  n ,  and 


(28.14) 


for  submatrices  B  (which  is  (n  —  k)x.k)  and  C  (which  is  ( n  —k)x  (n  —  k)).  Define 
the  /i -vector  x  =  (  x\  0  )T,  where  n  —  k  Os  follow  Xk-  Then  we  have 


=  xlAkXk 
<  0, 


which  contradicts  A  being  positive-definite. 
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We  now  turn  to  some  essential  properties  of  the  Schur  complement.  Let  A  be 
a  symmetric  positive-definite  matrix,  and  let  Ap  be  a  leading  k  x  k  submatrix 
of  A.  Partition  A  once  again  according  to  equation  (28.14).  We  generalize  equa¬ 
tion  (28.9)  to  define  the  Schur  complement  S  of  A  with  respect  to  Ap  as 

S  =  C-BAklBT.  (28.15) 

(By  Lemma  28.4,  Ap  is  symmetric  and  positive-definite;  therefore,  Ak[  exists  by 
Lemma  28.3,  and  S  is  well  defined.)  Note  that  our  earlier  definition  (28.9)  of  the 
Schur  complement  is  consistent  with  equation  (28.15),  by  letting  k  =  1. 

The  next  lemma  shows  that  the  Schur-complement  matrices  of  symmetric  posi¬ 
tive-definite  matrices  are  themselves  symmetric  and  positive-definite.  We  used  this 
result  in  Theorem  28.2,  and  we  need  its  corollary  to  prove  the  correctness  of  LU 
decomposition  for  symmetric  positive-definite  matrices. 

Lemma  28.5  (Schur  complement  lemma) 

If  A  is  a  symmetric  positive-definite  matrix  and  Ap  is  a  leading  k  x  k  submatrix 
of  A,  then  the  Schur  complement  S  of  A  with  respect  to  Ap  is  symmetric  and 
positive -definite. 


Proof  Because  A  is  symmetric,  so  is  the  submatrix  C.  By  Exercise  D.2-6,  the 
product  BAkx  BJ  is  symmetric,  and  by  Exercise  D.l-1,  S  is  symmetric. 

It  remains  to  show  that  S  is  positive -definite.  Consider  the  partition  of  A  given  in 
equation  (28.14).  For  any  nonzero  vector  x,  we  have  x'  Ax  >  0  by  the  assumption 
that  A  is  positive-definite.  Let  us  break  x  into  two  subvectors  y  and  z  compatible 
with  Ap  and  C,  respectively.  Because  Ak'  exists,  we  have 


xTrix  = 


(JT  ZT)  I 


(*  cXO 


=  (tt  zt) 


t  J  Aky  +  BTz  \ 
V  By  +  Cz  ) 


By  +  Cz 
yTAky  +  yTBTz  +  zTBy  +  zTCz 
(y  +  A^BTzfAp(y  +  A? BT z)  +  ZT(C  -  BA~klBT)z  , 


(28.16) 


by  matrix  magic.  (Verify  by  multiplying  through.)  This  last  equation  amounts  to 
“completing  the  square”  of  the  quadratic  form.  (See  Exercise  28.3-2.) 

Since  xT  A  x  >  0  holds  for  any  nonzero  x,  let  us  pick  any  nonzero  z  and  then 
choose  y  =  —Ak1BTz,  which  causes  the  first  term  in  equation  (28.16)  to  vanish, 
leaving 

ZT(C  -  BAk1BJ)z  =  ZTSz 

as  the  value  of  the  expression.  For  any  z  ^  0,  we  therefore  have  z.  '  Sz  — 
xTAx  >  0,  and  thus  S  is  positive-definite.  ■ 
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Corollary  28.6 

LU  decomposition  of  a  symmetric  positive-definite  matrix  never  causes  a  division 
by  0. 

Proof  Let  A  be  a  symmetric  positive-definite  matrix.  We  shall  prove  something 
stronger  than  the  statement  of  the  corollary:  every  pivot  is  strictly  positive.  The  first 
pivot  is  an.  Let  e\  be  the  first  unit  vector,  from  which  we  obtain  an  =  ejAei  >  0. 
Since  the  first  step  of  LU  decomposition  produces  the  Schur  complement  of  A 
with  respect  to  A ,  =  (an),  Lemma  28.5  implies  by  induction  that  all  pivots  are 
positive.  ■ 

Least-squares  approximation 

One  important  application  of  symmetric  positive-definite  matrices  arises  in  fitting 
curves  to  given  sets  of  data  points.  Suppose  that  we  are  given  a  set  of  m  data  points 

(xu  yx),  (x2,  y2),  ■  ■  ■ ,  (*m,  ym)  , 

where  we  know  that  the  yt  are  subject  to  measurement  errors.  We  would  like  to 
determine  a  function  F(x)  such  that  the  approximation  errors 

*li  =  F(xi)-yi  (28.17) 

are  small  for  i  =  1.2,...,  m .  The  form  of  the  function  F  depends  on  the  problem 
at  hand.  Here,  we  assume  that  it  has  the  form  of  a  linearly  weighted  sum, 

n 

f(x)  =  J2cjfj(x)  * 

./  =  ! 

where  the  number  of  summands  n  and  the  specific  basis  functions  fj  are  chosen 
based  on  knowledge  of  the  problem  at  hand.  A  common  choice  is  fj(x)  =  x/_l , 
which  means  that 

F{x)  =  Ci  +  c2x  +  c3x2  +  •  •  •  +  c„x"_1 

is  a  polynomial  of  degree  n  —  1  in  x.  Thus,  given  m  data  points  (x\ ,  yq),  (x2,  y2), 
. . . ,  (xm,  ym),  we  wish  to  calculate  n  coefficients  C\,c2, . .  ■  ,cn  that  minimize  the 
approximation  errors  rji,  rj2, . . . ,  r]m. 

By  choosing  n  =  m,  we  can  calculate  each  y,  exactly  in  equation  (28.17).  Such 
a  high-degree  F  “fits  the  noise”  as  well  as  the  data,  however,  and  generally  gives 
poor  results  when  used  to  predict  y  for  previously  unseen  values  of  x.  It  is  usu¬ 
ally  better  to  choose  n  significantly  smaller  than  m  and  hope  that  by  choosing  the 
coefficients  Cj  well,  we  can  obtain  a  function  F  that  finds  the  significant  patterns 
in  the  data  points  without  paying  undue  attention  to  the  noise.  Some  theoretical 
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principles  exist  for  choosing  n,  but  they  are  beyond  the  scope  of  this  text.  In  any 
case,  once  we  choose  a  value  of  n  that  is  less  than  m,  we  end  up  with  an  overde¬ 
termined  set  of  equations  whose  solution  we  wish  to  approximate.  We  now  show 
how  to  do  so. 

Let 


A  = 


flixi)  .  ■ 

fn(Xl) 

fl(x2) 

fl(x2)  ■  ■ 

fn(Xl) 

fl{xm)  .  . 

fn  (xm  ) 

denote  the  matrix  of  values  of  the  basis  functions  at  the  given  points;  that  is, 
ciij  =  fj(Xi).  Let  c  =  (Ck)  denote  the  desired  n -vector  of  coefficients.  Then, 


Ac 


/  /1O1) 

/  /i(w) 

\  /lOm) 


fl(Xl)  . 

•  fn(x l)  \ 

/  cl\ 

flixi)  . 

•  fn(x2) 

? 

fliXm)  . 

fn  (.Xm  )  J 

\Cn  / 

/  F{Xl)  \ 
/  F(x2) 

\  F(xm)  ) 


is  the  m -vector  of  “predicted  values”  for  y.  Thus, 
rj  =  Ac  —  y 


is  the  m -vector  of  approximation  errors. 

To  minimize  approximation  errors,  we  choose  to  minimize  the  norm  of  the  error 
vector  rj,  which  gives  us  a  least-squares  solution,  since 


Erf 

w=i 


1/2 


Because 

Wl2=  \\Ac-yf 


EE 


OijCj 


yi 


yj= i 


we  can  minimize  ||?]||  by  differentiating  j|  r;[|2  with  respect  to  each  cy  and  then 
setting  the  result  to  0: 
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(28.18) 


The  n  equations  (28.18)  for  k  =  1,2, ...  ,n  are  equivalent  to  the  single  matrix 
equation 

(Ac  —  y)T  A  =  0 

or,  equivalently  (using  Exercise  D.l-2),  to 
A1  (Ac  —  y)  =  0  , 
which  implies 


AtAc  =  ATy  . 


(28.19) 


In  statistics,  this  is  called  the  normal  equation.  The  matrix  AT A  is  symmetric 
by  Exercise  D.l-2,  and  if  A  has  full  column  rank,  then  by  Theorem  D.6,  AT  A 
is  positive-definite  as  well.  Hence,  (A1  A)~l  exists,  and  the  solution  to  equa¬ 
tion  (28.19)  is 


((ATA)  1AT)y 
A+y  , 


c 


(28.20) 


where  the  matrix  A+  =  ((ATA)_1  A')  is  the  pseudoinverse  of  the  matrix  A.  The 
pseudoinverse  naturally  generalizes  the  notion  of  a  matrix  inverse  to  the  case  in 
which  A  is  not  square.  (Compare  equation  (28.20)  as  the  approximate  solution  to 
Ac  —  y  with  the  solution  A~lb  as  the  exact  solution  to  Ax  =  b.) 

As  an  example  of  producing  a  least-squares  fit,  suppose  that  we  have  five  data 


points 


(xi,yi)  =  (-1,2), 
(x2,y2)  =  (1,1), 

O  3,j3)  =  (2,1), 

(x4,v4)  =  (3,0), 
(x5,Vs)  =  (5,3), 


shown  as  black  dots  in  Figure  28.3.  We  wish  to  fit  these  points  with  a  quadratic 
polynomial 


F(x)  =  Ci  +  c2x  +  c3x2  . 

We  start  with  the  matrix  of  basis-function  values 
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Figure  28.3  The  least  squares  fit  of  a  quadratic  polynomial  to  the  set  of  five  data  points 
{(— 1,  2),  (1, 1),  (2,  1),  (3,  0),  (5,  3)}.  The  black  dots  are  the  data  points,  and  the  white  dots  are  their 
estimated  values  predicted  by  the  polynomial  F(x)  =  1.2  —  0.151x  +  0.214.Y2,  the  quadratic  poly 
nomial  that  minimizes  the  sum  of  the  squared  errors.  Each  shaded  line  shows  the  error  for  one  data 
point. 


V1 


0.500 

A+  =  (  -0.388 
0.060 


0.300 

0.093 

-0.036 


0.200 

0.190 

-0.048 


4 
9 
25/ 


0.100 

0.193 

-0.036 


-0.100 

-0.088 

0.060 


Multiplying  y  by  A+,  we  obtain  the  coefficient  vector 

/  1.200  \ 
c  =  -0.757  I  , 

\  0.214  / 


which  corresponds  to  the  quadratic  polynomial 
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F(x)  =  1.200  -  0.757*  +  0.214*2 

as  the  closest-fitting  quadratic  to  the  given  data,  in  a  least-squares  sense. 

As  a  practical  matter,  we  solve  the  normal  equation  (28.19)  by  multiplying  y 
by  AT  and  then  finding  an  LU  decomposition  of  A'A.  If  A  has  full  rank,  the 
matrix  AT A  is  guaranteed  to  be  nonsingular,  because  it  is  symmetric  and  positive- 
definite.  (See  Exercise  D.l-2  and  Theorem  D.6.) 

Exercises 


28.3-1 

Prove  that  every  diagonal  element  of  a  symmetric  positive-definite  matrix  is  posi¬ 
tive. 


28.3-2 
Let  A  = 


(a  b 
b  c 


j  be  a  2  x  2  symmetric  positive-definite  matrix.  Prove  that  its 


determinant  ac  —  b2  is  positive  by  “completing  the  square”  in  a  manner  similar  to 
that  used  in  the  proof  of  Lemma  28.5. 


28.3-3 

Prove  that  the  maximum  element  in  a  symmetric  positive-definite  matrix  lies  on 
the  diagonal. 


28.3-4 

Prove  that  the  determinant  of  each  leading  submatrix  of  a  symmetric  positive- 
definite  matrix  is  positive. 


28.3-5 

Let  Ak  denote  the  kth  leading  submatrix  of  a  symmetric  positive-definite  matrix  A. 
Prove  that  det (Ak)/ det(^4*— i )  is  the  kth  pivot  during  LU  decomposition,  where, 
by  convention,  det(A0)  =  1. 


28.3-6 

Lind  the  function  of  the  form 

F(x)  =  Ci  +  c2x  lg  *  +  c3ex 

that  is  the  best  least-squares  fit  to  the  data  points 

(1,1),  (2,1),  (3, 3),  (4,  8)  . 
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28.3-7 

Show  that  the  pseudoinverse  A+  satisfies  the  following  four  equations: 

AA+A  =  A , 

A+AA+  =  A+ , 

(AA+)t  =  AA+  , 

(A+A)t  =  A+A  . 


Problems 


28-1  Tridiagonal  systems  of  linear  equations 
Consider  the  tridiagonal  matrix 


A 


/  1  -1  0  0  0\ 

-1  2-1  0  0 

0-1  2-1  0 
0  0-1  2-1 
\  0  0  0  -1  2/ 


a.  Find  an  LU  decomposition  of  A. 


b.  Solve  the  equation  Ax  =  (  1 
stitution. 


by  using  forward  and  back  sub- 


c.  Find  the  inverse  of  A. 

d.  Show  how,  for  any  n  x  n  symmetric  positive-definite,  tridiagonal  matrix  A  and 
any  n -vector  b,  to  solve  the  equation  Ax  =  b  in  O(n  )  time  by  performing  an 
LU  decomposition.  Argue  that  any  method  based  on  forming  A _1  is  asymptot¬ 
ically  more  expensive  in  the  worst  case. 

e.  Show  how,  for  any  n  x  n  nonsingular,  tridiagonal  matrix  A  and  any  n -vector  b,  to 
solve  the  equation  Ax  =  b  in  0(n)  time  by  performing  an  LUP  decomposition. 


28-2  Splines 

A  practical  method  for  interpolating  a  set  of  points  with  a  curve  is  to  use  cu¬ 
bic  splines.  We  are  given  a  set  { (jc, ,  y,j  :  i  =  0,  1, of  n  +  1  point-value 
pairs,  where  x0  <  X\  <  ■■■  <  x„.  We  wish  to  fit  a  piecewise-cubic  curve 
(spline)  / (x)  to  the  points.  That  is,  the  curve  / (x)  is  made  up  of  n  cubic  polyno¬ 
mials  f(x)  =  at  +  bjX  +  CjX2  +  diX3  for  i  =  0,  1 . n  —  1,  where  if  x  falls  in 
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the  range  x,  <  x  <  x,+i ,  then  the  value  of  the  curve  is  given  by  / (x)  =  f  (x — x, ). 
The  points  x,  at  which  the  cubic  polynomials  are  “pasted”  together  are  called  knots. 
For  simplicity,  we  shall  assume  that  x,  =  i  for  i  =  0, 1, . . . ,  n. 

To  ensure  continuity  of  /(x),  we  require  that 

/(•*;)  =  MO)  =  Vi  , 

/(x/+ 0  =  M 1)  =  34+1 

for  i  =  0, 1, ...  ,n  —  1.  To  ensure  that  /(x)  is  sufficiently  smooth,  we  also  insist 
that  the  first  derivative  be  continuous  at  each  knot: 

/'(*;+ 1)  =  1)  =  fUA  0) 

for  i  =  0, 1, ...,«  —  2. 

a.  Suppose  that  for  i  =  0,1 , ,n,  we  are  given  not  only  the  point-value  pairs 
{(x/,  V/)}  but  also  the  first  derivatives  D,  =  /'  (x, )  at  each  knot.  Express  each 
coefficient  a,,  ft,-,  c,-,  and  r/;  in  terms  of  the  values  y;,  _V;+i,  A,  and  A+i- 
(Remember  that  x,  =  i.)  How  quickly  can  we  compute  the  4/?  coefficients 
from  the  point-value  pairs  and  first  derivatives? 

The  question  remains  of  how  to  choose  the  first  derivatives  of  /(x)  at  the  knots. 
One  method  is  to  require  the  second  derivatives  to  be  continuous  at  the  knots: 

/"(*m)  =  /"(l)  =  fi+ 1(0) 

for  i  =  0,  1, . . . , —  2.  At  the  first  and  last  knots,  we  assume  that  /"(x0)  = 
/0"( 0)  =  0  and  f"(xn)  —  /n"_,  ( 1 )  =  0;  these  assumptions  make  /(x)  a  natural 
cubic  spline. 

ft.  Use  the  continuity  constraints  on  the  second  derivative  to  show  that  for  i  = 


1, 2,  .  .  .  ,  77  — 

1, 

Dj~  i  +  4  Dj 

+  Dj+ 1  =  3(y1+i  —  J7;_i)  . 

(28.21) 

Show  that 

2Dq  +  Di 

=  3(j!  -  y0)  , 

(28.22) 

D„~  i  +  2Z)„ 

=  3(_y„  —  y„-i)  . 

(28.23) 

d.  Rewrite  equations  (28.21)-(28.23)  as  a  matrix  equation  involving  the  vector 

D  =  (D0,  D i . D„)  of  unknowns.  What  attributes  does  the  matrix  in  your 

equation  have? 

e.  Argue  that  a  natural  cubic  spline  can  interpolate  a  set  of  77  +  1  point-value  pairs 
in  O(n)  time  (see  Problem  28-1). 
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f.  Show  how  to  determine  a  natural  cubic  spline  that  interpolates  a  set  of  n  +  1 
points  (X; ,yt)  satisfying  x0  <  X\  <■■■  <  xn ,  even  when  x,  is  not  necessarily 
equal  to  i .  What  matrix  equation  must  your  method  solve,  and  how  quickly 
does  your  algorithm  run? 


Chapter  notes 

Many  excellent  texts  describe  numerical  and  scientific  computation  in  much  greater 
detail  than  we  have  room  for  here.  The  following  are  especially  readable:  George 
and  Liu  [132],  Golub  and  Van  Loan  [144],  Press,  Teukolsky,  Vetterling,  and  Flan¬ 
nery  [283,  284],  and  Strang  [323,  324]. 

Golub  and  Van  Loan  [144]  discuss  numerical  stability.  They  show  why  del (4) 
is  not  necessarily  a  good  indicator  of  the  stability  of  a  matrix  A,  proposing  instead 
to  use  || A || oq  M-1|loo’  where  || ^4 1| ^  =  maxi<,<„  J^"=1  |a,-y|.  They  also  address 
the  question  of  how  to  compute  this  value  without  actually  computing  A -1. 

Gaussian  elimination,  upon  which  the  LU  and  LUP  decompositions  are  based, 
was  the  first  systematic  method  for  solving  linear  systems  of  equations.  It  was  also 
one  of  the  earliest  numerical  algorithms.  Although  it  was  known  earlier,  its  dis¬ 
covery  is  commonly  attributed  to  C.  F.  Gauss  (1777-1855).  In  his  famous  paper 
[325],  Strassen  showed  that  an  n  x  n  matrix  can  be  inverted  in  0(nXgl)  time.  Wino- 
grad  [358]  originally  proved  that  matrix  multiplication  is  no  harder  than  matrix 
inversion,  and  the  converse  is  due  to  Aho,  Hopcroft,  and  Ullman  [5]. 

Another  important  matrix  decomposition  is  the  singular  value  decomposition, 
or  SVD.  The  SVD  factors  an  m  x  n  matrix  A  into  A  =  Q  t  X  Q'2,  where  X  is  an 
m  x  n  matrix  with  nonzero  values  only  on  the  diagonal,  Q ,  is  m  x  m  with  mutually 
orthonormal  columns,  and  Q2  is  n  x  n,  also  with  mutually  orthonormal  columns. 
Two  vectors  are  orthonormal  if  their  inner  product  is  0  and  each  vector  has  a  norm 
of  1.  The  books  by  Strang  [323,  324]  and  Golub  and  Van  Loan  [144]  contain  good 
treatments  of  the  SVD. 

Strang  [324]  has  an  excellent  presentation  of  symmetric  positive-definite  matri¬ 
ces  and  of  linear  algebra  in  general. 
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Many  problems  take  the  form  of  maximizing  or  minimizing  an  objective,  given 
limited  resources  and  competing  constraints.  If  we  can  specify  the  objective  as 
a  lineal-  function  of  certain  variables,  and  if  we  can  specify  the  constraints  on 
resources  as  equalities  or  inequalities  on  those  variables,  then  we  have  a  linear- 
programming  problem.  Linear  programs  arise  in  a  variety  of  practical  applica¬ 
tions.  We  begin  by  studying  an  application  in  electoral  politics. 

A  political  problem 

Suppose  that  you  are  a  politician  trying  to  win  an  election.  Your  district  has  three 
different  types  of  areas— urban,  suburban,  and  rural.  These  areas  have,  respec¬ 
tively,  100,000,  200,000,  and  50,000  registered  voters.  Although  not  all  the  reg¬ 
istered  voters  actually  go  to  the  polls,  you  decide  that  to  govern  effectively,  you 
would  like  at  least  half  the  registered  voters  in  each  of  the  three  regions  to  vote  for 
you.  You  are  honorable  and  would  never  consider  supporting  policies  in  which  you 
do  not  believe.  You  realize,  however,  that  certain  issues  may  be  more  effective  in 
winning  votes  in  certain  places.  Your  primary  issues  are  building  more  roads,  gun 
control,  farm  subsidies,  and  a  gasoline  tax  dedicated  to  improved  public  transit. 
According  to  your  campaign  staff’s  research,  you  can  estimate  how  many  votes 
you  win  or  lose  from  each  population  segment  by  spending  $  1 ,000  on  advertising 
on  each  issue.  This  information  appears  in  the  table  of  Figure  29.1.  In  this  table, 
each  entry  indicates  the  number  of  thousands  of  either  urban,  suburban,  or  rural 
voters  who  would  be  won  over  by  spending  $1,000  on  advertising  in  support  of  a 
particular  issue.  Negative  entries  denote  votes  that  would  be  lost.  Your  task  is  to 
figure  out  the  minimum  amount  of  money  that  you  need  to  spend  in  order  to  win 
50,000  urban  votes,  100,000  suburban  votes,  and  25,000  rural  votes. 

You  could,  by  trial  and  error,  devise  a  strategy  that  wins  the  required  number 
of  votes,  but  the  strategy  you  come  up  with  might  not  be  the  least  expensive  one. 
For  example,  you  could  devote  $20,000  of  advertising  to  building  roads,  $0  to  gun 
control,  $4,000  to  farm  subsidies,  and  $9,000  to  a  gasoline  tax.  In  this  case,  you 
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policy 

urban 

suburban 

rural 

build  roads 

-2 

5 

3 

gun  control 

8 

2 

-5 

farm  subsidies 

0 

0 

10 

gasoline  tax 

10 

0 

—2 

Figure  29.1  The  effects  of  policies  on  voters.  Each  entry  describes  the  number  of  thousands  of 
urban,  suburban,  or  rural  voters  who  could  be  won  over  by  spending  $1,000  on  advertising  support 
of  a  policy  on  a  particular  issue.  Negative  entries  denote  votes  that  would  be  lost. 

would  win  20(— 2)+0(8)+4(0)  +  9(10)  =  50  thousand  urban  votes,  20(5)+0(2)  + 
4(0) +  9(0)  =  100  thousand  suburban  votes,  and  20(3) +  0(— 5) +  4(10)  + 9(— 2)  = 
82  thousand  rural  votes.  You  would  win  the  exact  number  of  votes  desired  in  the 
urban  and  suburban  areas  and  more  than  enough  votes  in  the  rural  area.  (In  fact, 
in  the  rural  area,  you  would  receive  more  votes  than  there  are  voters.)  In  order  to 
garner  these  votes,  you  would  have  paid  for  20  +  0  +  4  +  9  =  33  thousand  dollars 
of  advertising. 

Naturally,  you  may  wonder  whether  this  strategy  is  the  best  possible.  That  is, 
could  you  achieve  your  goals  while  spending  less  on  advertising?  Additional  trial 
and  error  might  help  you  to  answer  this  question,  but  wouldn’t  you  rather  have  a 
systematic  method  for  answering  such  questions?  In  order  to  develop  one,  we  shall 
formulate  this  question  mathematically.  We  introduce  4  variables: 

•  Xi  is  the  number  of  thousands  of  dollars  spent  on  advertising  on  building  roads, 

•  x2  is  the  number  of  thousands  of  dollars  spent  on  advertising  on  gun  control, 

•  x3  is  the  number  of  thousands  of  dollars  spent  on  advertising  on  farm  subsidies, 
and 

•  x4  is  the  number  of  thousands  of  dollars  spent  on  advertising  on  a  gasoline  tax. 
We  can  write  the  requirement  that  we  win  at  least  50,000  urban  votes  as 

— 2xi  +  8x2  +  OX3  +  IOX4  +  50  .  (29.1) 

Similarly,  we  can  write  the  requirements  that  we  win  at  least  100,000  suburban 
votes  and  25,000  rural  votes  as 

5xi  +  2x2  +  OX3  +  OX4  +  100  (29.2) 

and 

3xi  —  5x2  +  10x3  —  2x4  >  25  .  (29.3) 

Any  setting  of  the  variables  X!,X2,X3,x4  that  satisfies  inequalities  (29.1)-(29.3) 
yields  a  strategy  that  wins  a  sufficient  number  of  each  type  of  vote.  In  order  to 
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keep  costs  as  small  as  possible,  you  would  like  to  minimize  the  amount  spent  on 
advertising.  That  is,  you  want  to  minimize  the  expression 

X\  T  X2  T  x3  T  X4  .  (29.4) 

Although  negative  advertising  often  occurs  in  political  campaigns,  there  is  no  such 
thing  as  negative-cost  advertising.  Consequently,  we  require  that 

Xi  >  0,  x2  >  0,  x3  >  0,  and  x4  >  0  .  (29.5) 

Combining  inequalities  (29. 1 )— (29.3)  and  (29.5)  with  the  objective  of  minimiz¬ 
ing  (29.4),  we  obtain  what  is  known  as  a  “linear  program.”  We  format  this  problem 


as 

minimize 

Xi 

+ 

x2 

+ 

•*3 

+ 

x4 

(29.6) 

subject  to 

— 2xi 

+ 

8x2 

+ 

0x3 

+ 

10x4 

> 

50 

(29.7) 

5xi 

+ 

2x2 

+ 

Ox  3 

+ 

0x4 

> 

100 

(29.8) 

3xi 

- 

5x2 

+ 

10x3 

- 

2x4 

> 

25 

(29.9) 

Xi,  X2,  X3,  X4 

> 

0  . 

(29.10) 

The  solution  of  this  linear  program  yields  your  optimal  strategy. 

General  linear  programs 

In  the  general  linear-programming  problem,  we  wish  to  optimize  a  linear  function 
subject  to  a  set  of  1  i near  inequalities.  Given  a  set  of  real  numbers  a , .  a2, ....  tin  and 
a  set  of  variables  Xi,  x2,  ■  .  . ,  x„ ,  we  define  a  linear  function  f  on  those  variables 
by 

n 

f{x i,x2, . . .  ,X„)  =  axxx  +  a2x2  -\ - b  anxn  =  ^ajXj  . 

j= 1 

If  b  is  a  real  number  and  /  is  a  1  i near  function,  then  the  equation 

/(x i,x2, . . .  ,x„)  =  b 

is  a  linear  equality  and  the  inequalities 

/(x i,x2,  ...,xn)<b 
and 

/(x i,x2,...,x„)  >  b 
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are  linear  inequalities.  We  use  the  general  term  linear  constraints  to  denote  either 
lineal-  equalities  or  linear  inequalities.  In  linear  programming,  we  do  not  allow 
strict  inequalities.  Formally,  a  linear-programming  problem  is  the  problem  of 
either  minimizing  or  maximizing  a  linear  function  subject  to  a  finite  set  of  linear 
constraints.  If  we  are  to  minimize,  then  we  call  the  linear  program  a  minimization 
linear  program ,  and  if  we  are  to  maximize,  then  we  call  the  linear  program  a 
maximization  linear  program. 

The  remainder  of  this  chapter  covers  how  to  formulate  and  solve  linear  pro¬ 
grams.  Although  several  polynomial-time  algorithms  for  linear  programming  have 
been  developed,  we  will  not  study  them  in  this  chapter.  Instead,  we  shall  study  the 
simplex  algorithm,  which  is  the  oldest  linear-programming  algorithm.  The  simplex 
algorithm  does  not  run  in  polynomial  time  in  the  worst  case,  but  it  is  fairly  efficient 
and  widely  used  in  practice. 

An  overview  of  linear  programming 

In  order  to  describe  properties  of  and  algorithms  for  linear  programs,  we  find  it 
convenient  to  express  them  in  canonical  forms.  We  shall  use  two  forms,  standard 
and  slack,  in  this  chapter.  We  will  define  them  precisely  in  Section  29. 1 .  Infor¬ 
mally,  a  linear  program  in  standard  form  is  the  maximization  of  a  linear  function 
subject  to  linear  inequalities,  whereas  a  linear  program  in  slack  form  is  the  max¬ 
imization  of  a  linear  function  subject  to  linear  equalities.  We  shall  typically  use 
standard  form  for  expressing  linear  programs,  but  we  find  it  more  convenient  to 
use  slack  form  when  we  describe  the  details  of  the  simplex  algorithm.  For  now,  we 
restrict  our  attention  to  maximizing  a  linear  function  on  n  variables  subject  to  a  set 
of  m  linear  inequalities. 

Let  us  first  consider  the  following  linear  program  with  two  variables: 
maximize  X\  +  x2  (29.11) 


subject  to 


4xi  —  x2  <  8 

2xi  +  x2  <  10 

5xi  —  2x2  >  —2 


(29.12) 

(29.13) 

(29.14) 

(29.15) 


X\,x2  >  0  . 


We  call  any  setting  of  the  variables  Xi  and  x2  that  satisfies  all  the  constraints 
(29. 12)-(29. 15)  a  feasible  solution  to  the  linear  program.  If  we  graph  the  con¬ 
straints  in  the  (xi, x2) -Cartesian  coordinate  system,  as  in  Figure  29.2(a),  we  see 
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Figure  29.2  (a)  The  linear  program  given  in  (29.12)  (29.15).  Each  constraint  is  represented  by 
a  line  and  a  direction.  The  intersection  of  the  constraints,  which  is  the  feasible  region,  is  shaded, 
(b)  The  dotted  lines  show,  respectively,  the  points  for  which  the  objective  value  is  0,  4,  and  8.  The 
optimal  solution  to  the  linear  program  is  xi  =2  and  .\2  =  6  with  objective  value  8. 

that  the  set  of  feasible  solutions  (shaded  in  the  figure)  forms  a  convex  region1  in 
the  two-dimensional  space.  We  call  this  convex  region  the  feasible  region  and  the 
function  we  wish  to  maximize  the  objective  function.  Conceptually,  we  could  eval¬ 
uate  the  objective  function  x\  +  x2  at  each  point  in  the  feasible  region;  we  call  the 
value  of  the  objective  function  at  a  particular  point  the  objective  value.  We  could 
then  identify  a  point  that  has  the  maximum  objective  value  as  an  optimal  solution. 
For  this  example  (and  for  most  linear  programs),  the  feasible  region  contains  an 
infinite  number  of  points,  and  so  we  need  to  determine  an  efficient  way  to  find  a 
point  that  achieves  the  maximum  objective  value  without  explicitly  evaluating  the 
objective  function  at  every  point  in  the  feasible  region. 

In  two  dimensions,  we  can  optimize  via  a  graphical  procedure.  The  set  of  points 
forwhichXj+x2  =  Z,  for  any  z,  is  a  line  with  aslope  of  —  1.  IfweplotXi+x2  =  0, 
we  obtain  the  line  with  slope  —1  through  the  origin,  as  in  Figure  29.2(b).  The 
intersection  of  this  line  and  the  feasible  region  is  the  set  of  feasible  solutions  that 
have  an  objective  value  of  0.  In  this  case,  that  intersection  of  the  line  with  the 
feasible  region  is  the  single  point  (0, 0).  More  generally,  for  any  z,  the  intersection 


1  An  intuitive  definition  of  a  convex  region  is  that  it  fulfills  the  requirement  that  for  any  two  points  in 
the  region,  all  points  on  a  line  segment  between  them  are  also  in  the  region. 
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of  the  line  X\  +  x2  =  Z  and  the  feasible  region  is  the  set  of  feasible  solutions  that 
have  objective  value  z.  Figure  29.2(b)  shows  the  lines  X\  +  x2  =  0,  X\  +  x2  =■  4, 
and  X\  +  x2  —  8.  Because  the  feasible  region  in  Figure  29.2  is  bounded,  there 
must  be  some  maximum  value  z  for  which  the  intersection  of  the  line  X\  +  x2  =  z 
and  the  feasible  region  is  nonempty.  Any  point  at  which  this  occurs  is  an  optimal 
solution  to  the  linear  program,  which  in  this  case  is  the  point  X\  =  2  and  x2  =  6 
with  objective  value  8. 

It  is  no  accident  that  an  optimal  solution  to  the  1  i near  program  occurs  at  a  vertex 
of  the  feasible  region.  The  maximum  value  of  z  for  which  the  line  X\  +  x2  =  z 
intersects  the  feasible  region  must  be  on  the  boundary  of  the  feasible  region,  and 
thus  the  intersection  of  this  line  with  the  boundary  of  the  feasible  region  is  either  a 
single  vertex  or  a  line  segment.  If  the  intersection  is  a  single  vertex,  then  there  is 
just  one  optimal  solution,  and  it  is  that  vertex.  If  the  intersection  is  a  line  segment, 
every  point  on  that  line  segment  must  have  the  same  objective  value;  in  particular, 
both  endpoints  of  the  line  segment  are  optimal  solutions.  Since  each  endpoint  of  a 
line  segment  is  a  vertex,  there  is  an  optimal  solution  at  a  vertex  in  this  case  as  well. 

Although  we  cannot  easily  graph  linear  programs  with  more  than  two  variables, 
the  same  intuition  holds.  If  we  have  three  variables,  then  each  constraint  corre¬ 
sponds  to  a  half-space  in  three-dimensional  space.  The  intersection  of  these  half¬ 
spaces  forms  the  feasible  region.  The  set  of  points  for  which  the  objective  function 
obtains  a  given  value  z  is  now  a  plane  (assuming  no  degenerate  conditions).  If  all 
coefficients  of  the  objective  function  are  nonnegative,  and  if  the  origin  is  a  feasible 
solution  to  the  linear  program,  then  as  we  move  this  plane  away  from  the  origin,  in 
a  direction  normal  to  the  objective  function,  we  find  points  of  increasing  objective 
value.  (If  the  origin  is  not  feasible  or  if  some  coefficients  in  the  objective  function 
are  negative,  the  intuitive  picture  becomes  slightly  more  complicated.)  As  in  two 
dimensions,  because  the  feasible  region  is  convex,  the  set  of  points  that  achieve 
the  optimal  objective  value  must  include  a  vertex  of  the  feasible  region.  Simi¬ 
larly,  if  we  have  n  variables,  each  constraint  defines  a  half-space  in  n -dimensional 
space.  We  call  the  feasible  region  formed  by  the  intersection  of  these  half-spaces  a 
simplex.  The  objective  function  is  now  a  hyperplane  and,  because  of  convexity,  an 
optimal  solution  still  occurs  at  a  vertex  of  the  simplex. 

The  simplex  algorithm  takes  as  input  a  linear  program  and  returns  an  optimal 
solution.  It  starts  at  some  vertex  of  the  simplex  and  performs  a  sequence  of  itera¬ 
tions.  In  each  iteration,  it  moves  along  an  edge  of  the  simplex  from  a  current  vertex 
to  a  neighboring  vertex  whose  objective  value  is  no  smaller  than  that  of  the  current 
vertex  (and  usually  is  larger.)  The  simplex  algorithm  terminates  when  it  reaches 
a  local  maximum,  which  is  a  vertex  from  which  all  neighboring  vertices  have  a 
smaller  objective  value.  Because  the  feasible  region  is  convex  and  the  objective 
function  is  linear,  this  local  optimum  is  actually  a  global  optimum.  In  Section  29.4, 
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we  shall  use  a  concept  called  “duality”  to  show  that  the  solution  returned  by  the 
simplex  algorithm  is  indeed  optimal. 

Although  the  geometric  view  gives  a  good  intuitive  view  of  the  operations  of  the 
simplex  algorithm,  we  shall  not  refer  to  it  explicitly  when  developing  the  details 
of  the  simplex  algorithm  in  Section  29.3.  Instead,  we  take  an  algebraic  view.  We 
first  write  the  given  linear  program  in  slack  form,  which  is  a  set  of  linear  equalities. 
These  linear  equalities  express  some  of  the  variables,  called  “basic  variables,”  in 
terms  of  other  variables,  called  “nonbasic  variables.”  We  move  from  one  vertex 
to  another  by  making  a  basic  variable  become  nonbasic  and  making  a  nonbasic 
variable  become  basic.  We  call  this  operation  a  “pivot”  and,  viewed  algebraically, 
it  is  nothing  more  than  rewriting  the  1  i near  program  in  an  equivalent  slack  form. 

The  two-variable  example  described  above  was  particularly  simple.  We  shall 
need  to  address  several  more  details  in  this  chapter.  These  issues  include  iden¬ 
tifying  linear  programs  that  have  no  solutions,  linear  programs  that  have  no  finite 
optimal  solution,  and  1  inear  programs  for  which  the  origin  is  not  a  feasible  solution. 

Applications  of  linear  programming 

Linear  programming  has  a  large  number  of  applications.  Any  textbook  on  opera¬ 
tions  research  is  filled  with  examples  of  linear  programming,  and  linear  program¬ 
ming  has  become  a  standard  tool  taught  to  students  in  most  business  schools.  The 
election  scenario  is  one  typical  example.  Two  more  examples  of  linear  program¬ 
ming  are  the  following: 

•  An  airline  wishes  to  schedule  its  flight  crews.  The  Federal  Aviation  Adminis¬ 
tration  imposes  many  constraints,  such  as  limiting  the  number  of  consecutive 
hours  that  each  crew  member  can  work  and  insisting  that  a  particular-  crew  work 
only  on  one  model  of  aircraft  during  each  month.  The  airline  wants  to  schedule 
crews  on  all  of  its  flights  using  as  few  crew  members  as  possible. 

•  An  oil  company  wants  to  decide  where  to  drill  for  oil.  Siting  a  drill  at  a  particu¬ 
lar  location  has  an  associated  cost  and,  based  on  geological  surveys,  an  expected 
payoff  of  some  number  of  barrels  of  oil.  The  company  has  a  limited  budget  for 
locating  new  drills  and  wants  to  maximize  the  amount  of  oil  it  expects  to  find, 
given  this  budget. 

With  linear  programs,  we  also  model  and  solve  graph  and  combinatorial  prob¬ 
lems,  such  as  those  appealing  in  this  textbook.  We  have  already  seen  a  special 
case  of  lineal-  programming  used  to  solve  systems  of  difference  constraints  in  Sec¬ 
tion  24.4.  In  Section  29.2,  we  shall  study  how  to  formulate  several  graph  and 
network-flow  problems  as  linear  programs.  In  Section  35.4,  we  shall  use  linear 
programming  as  a  tool  to  find  an  approximate  solution  to  another  graph  problem. 
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Algorithms  for  linear  programming 

This  chapter  studies  the  simplex  algorithm.  This  algorithm,  when  implemented 
carefully,  often  solves  general  linear  programs  quickly  in  practice.  With  some 
carefully  contrived  inputs,  however,  the  simplex  algorithm  can  require  exponential 
time.  The  first  polynomial-time  algorithm  for  linear  programming  was  the  ellipsoid 
algorithm,  which  runs  slowly  in  practice.  A  second  class  of  polynomial-time  algo¬ 
rithms  are  known  as  interior-point  methods.  In  contrast  to  the  simplex  algorithm, 
which  moves  along  the  exterior  of  the  feasible  region  and  maintains  a  feasible  solu¬ 
tion  that  is  a  vertex  of  the  simplex  at  each  iteration,  these  algorithms  move  through 
the  interior  of  the  feasible  region.  The  intermediate  solutions,  while  feasible,  are 
not  necessarily  vertices  of  the  simplex,  but  the  final  solution  is  a  vertex.  For  large 
inputs,  interior-point  algorithms  can  run  as  fast  as,  and  sometimes  faster  than,  the 
simplex  algorithm.  The  chapter  notes  point  you  to  more  information  about  these 
algorithms. 

If  we  add  to  a  linear  program  the  additional  requirement  that  all  variables  take 
on  integer  values,  we  have  an  integer  linear  program.  Exercise  34.5-3  asks  you 
to  show  that  just  finding  a  feasible  solution  to  this  problem  is  NP-hard;  since 
no  polynomial-time  algorithms  are  known  for  any  NP-hard  problems,  there  is  no 
known  polynomial-time  algorithm  for  integer  linear  programming.  In  contrast,  we 
can  solve  a  general  linear-programming  problem  in  polynomial  time. 

In  this  chapter,  if  we  have  a  linear  program  with  variables  x  =  (xi,  x2, . . . ,  x„) 
and  wish  to  refer  to  a  particular  setting  of  the  variables,  we  shall  use  the  notation 
X  =  (xi,x2, . .  .,X„). 


29.1  Standard  and  slack  forms 

This  section  describes  two  formats,  standard  form  and  slack  form,  that  are  use¬ 
ful  when  we  specify  and  work  with  linear  programs.  In  standard  form,  all  the 
constraints  are  inequalities,  whereas  in  slack  form,  all  constraints  are  equalities 
(except  for  those  that  require  the  variables  to  be  nonnegative). 

Standard  form 

In  standard  form,  we  are  given  n  real  numbers  C\,  c2, .  •  • ,  c„;  m  real  numbers 
h\ ,  b2,  ■  ■  ■  ,bm\  and  mn  real  numbers  for  i  =  1,2, .. .  ,m  and  j  =  1,2, . . .  ,n. 
We  wish  to  find  n  real  numbers  X\,x2, . . .  ,xn  that 
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maximize 

n 

J2cjXj 

(29.16) 

7  =  1 

subject  to 

n 

J2abxJ  - 

bt 

for  i 

=  1,2, ...  ,m 

(29.17) 

7  =  1 

£ 

IV 

0 

for  j 

=  1,2, ...,//  . 

(29.18) 

Generalizing  the  terminology  we  introduced  for  the  two- variable  linear  program, 
we  call  expression  (29.16)  the  objective  function  and  the  n  +  m  inequalities  in 
lines  (29.17)  and  (29.18)  the  constraints.  The  n  constraints  in  line  (29.18)  are  the 
nonnegativity  constraints.  An  arbitrary  linear  program  need  not  have  nonnegativ¬ 
ity  constraints,  but  standard  form  requires  them.  Sometimes  we  find  it  convenient 
to  express  a  linear  program  in  a  more  compact  form.  If  we  create  an  m  x  n  matrix 


A  =  (fly),  an  ///-vector  b  =  ( bj ),  an  /(-vector  c  =  (cj),  and  an  //-vector  x  =  (xj), 
then  we  can  rewrite  the  linear  program  defined  in  (29. 16)— (29. 1 8)  as 

maximize  cTx  (29.19) 

subject  to 

Ax  <  b  (29.20) 

x  >  0 .  (29.21) 


In  line  (29.19),  cTx  is  the  inner  product  of  two  vectors.  In  inequality  (29.20),  Ax 
is  a  matrix- vector  product,  and  in  inequality  (29.21),  x  >  0  means  that  each  entry 
of  the  vector  x  must  be  nonnegative.  We  see  that  we  can  specify  a  linear  program 
in  standard  form  by  a  tuple  ( A ,  b.  c ),  and  we  shall  adopt  the  convention  that  A,  b, 
and  c  always  have  the  dimensions  given  above. 

We  now  introduce  terminology  to  describe  solutions  to  linear  programs.  We  used 
some  of  this  terminology  in  the  earlier  example  of  a  two-variable  linear  program. 
We  call  a  setting  of  the  variables  x  that  satisfies  all  the  constraints  a  feasible  solu¬ 
tion,  whereas  a  setting  of  the  variables  x  that  fails  to  satisfy  at  least  one  constraint 
is  an  infeasible  solution.  We  say  that  a  solution  x  has  objective  value  cTx.  A  fea¬ 
sible  solution  x  whose  objective  value  is  maximum  over  all  feasible  solutions  is  an 
optimal  solution,  and  we  call  its  objective  value  cTx  the  optimal  objective  value. 
If  a  linear  program  has  no  feasible  solutions,  we  say  that  the  linear  program  is  in¬ 
feasible’,  otherwise  it  is  feasible.  If  a  linear  program  has  some  feasible  solutions 
but  does  not  have  a  finite  optimal  objective  value,  we  say  that  the  linear  program 
is  unbounded.  Exercise  29.1-9  asks  you  to  show  that  a  linear  program  can  have  a 
finite  optimal  objective  value  even  if  the  feasible  region  is  not  bounded. 
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Converting  linear  programs  into  standard  form 

It  is  always  possible  to  convert  a  linear  program,  given  as  minimizing  or  maxi¬ 
mizing  a  linear  function  subject  to  linear  constraints,  into  standard  form.  A  linear 
program  might  not  be  in  standard  form  for  any  of  four  possible  reasons: 

1.  The  objective  function  might  be  a  minimization  rather  than  a  maximization. 

2.  There  might  be  variables  without  nonnegativity  constraints. 

3.  There  might  be  equality  constraints ,  which  have  an  equal  sign  rather  than  a 
less-than-or-equal-to  sign. 

4.  There  might  be  inequality  constraints,  but  instead  of  having  a  less-than-or- 
equal-to  sign,  they  have  a  greater-than-or-equal-to  sign. 

When  converting  one  linear-  program  L  into  another  linear-  program  L',  we  would 
like  the  property  that  an  optimal  solution  to  L'  yields  an  optimal  solution  to  L.  To 
capture  this  idea,  we  say  that  two  maximization  linear-  programs  L  and  L'  are 
equivalent  if  for  each  feasible  solution  x  to  L  with  objective  value  z,  there  is 
a  corresponding  feasible  solution  x1  to  L'  with  objective  value  z,  and  for  each 
feasible  solution  x'  to  L'  with  objective  value  z,  there  is  a  corresponding  feasible 
solution  x  to  L  with  objective  value  z.  (This  definition  does  not  imply  a  one-to- 
one  correspondence  between  feasible  solutions.)  A  minimization  linear-  program  L 
and  a  maximization  linear  program  L'  are  equivalent  if  for  each  feasible  solution  x 
to  L  with  objective  value  2,  there  is  a  corresponding  feasible  solution  x'  to  L'  with 
objective  value  —z,  and  for  each  feasible  solution  x'  to  L'  with  objective  value  z, 
there  is  a  corresponding  feasible  solution  x  to  L  with  objective  value  —z. 

We  now  show  how  to  remove,  one  by  one,  each  of  the  possible  problems  in  the 
list  above.  After  removing  each  one,  we  shall  argue  that  the  new  linear  program  is 
equivalent  to  the  old  one. 

To  convert  a  minimization  linear  program  L  into  an  equivalent  maximization  lin¬ 
ear  program  L',  we  simply  negate  the  coefficients  in  the  objective  function.  Since 
L  and  L'  have  identical  sets  of  feasible  solutions  and,  for  any  feasible  solution,  the 
objective  value  in  L  is  the  negative  of  the  objective  value  in  L' ,  these  two  linear 
programs  are  equivalent.  For  example,  if  we  have  the  linear  program 

minimize  —  2xi  +  3x2 
subject  to 

X\  T  x2  —  2 

Xi  —  2x2  <  4 

Xi  >  0  , 

and  we  negate  the  coefficients  of  the  objective  function,  we  obtain 
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maximize 
subject  to 


2x\  —  3x2 


Xi 

+  x2 

= 

7 

Xi 

—  2x2 

< 

4 

Xi 

> 

0 

Next,  we  show  how  to  convert  a  linear  program  in  which  some  of  the  variables 
do  not  have  nonnegativity  constraints  into  one  in  which  each  variable  has  a  non¬ 
negativity  constraint.  Suppose  that  some  variable  Xj  does  not  have  a  nonnegativity 
constraint.  Then,  we  replace  each  occurrence  of  Xj  by  xj  —  xj,  and  add  the  non¬ 
negativity  constraints  xj  >  0  and  xj  >  0.  Thus,  if  the  objective  function  has  a 
term  CjXj,  we  replace  it  by  Cj xj  —  CjX j ,  and  if  constraint  i  has  a  term  a^Xj,  we 
replace  it  by  aiy-  xj  —  a^xj.  Any  feasible  solution  x  to  the  new  linear-  program  cor¬ 
responds  to  a  feasible  solution  x  to  the  original  linear-  program  with  Xj  =  xj  —  xj' 
and  with  the  same  objective  value.  Also,  any  feasible  solution  x  to  the  original 
linear  program  corresponds  to  a  feasible  solution  x  to  the  new  linear-  program  with 
xj  =  Xj  and  x'j  =  0  if  Xj  >  0,  or  with  xj  =  Xj  and  xj  =  0  if  Xj  <  0.  The  two 
linear  programs  have  the  same  objective  value  regardless  of  the  sign  of  Xj.  Thus, 
the  two  linear  programs  are  equivalent.  We  apply  this  conversion  scheme  to  each 
variable  that  does  not  have  a  nonnegativity  constraint  to  yield  an  equivalent  linear 
program  in  which  all  variables  have  nonnegativity  constraints. 

Continuing  the  example,  we  want  to  ensure  that  each  variable  has  a  correspond¬ 
ing  nonnegativity  constraint.  Variable  X\  has  such  a  constraint,  but  variable  x2  does 
not.  Therefore,  we  replace  x2  by  two  variables  xj  and  xj,  and  we  modify  the  linear 
program  to  obtain 


maximize 

2xi 

3x2 

+ 

3xj' 

subject  to 

Xi 

+  xj 

— 

xj 

=  7 

(29.22) 

Xi 

—  2x2 

+ 

2  xj 

<  4 

xi,xj,xj  >  0  . 


Next,  we  convert  equality  constraints  into  inequality  constraints.  Suppose  that  a 
linear  program  has  an  equality  constraint  /(x i,x2, . . .  ,x„)  =  b.  Since  x  =  y  if 
and  only  if  both  x  >  y  and  x  <  y,  we  can  replace  this  equality  constraint  by  the 
pair  of  inequality  constraints  f(x i,  x2, . . . , x„)  <  b  and  /(x i, x2, . . . ,  x„)  >  b. 
Repeating  this  conversion  for  each  equality  constraint  yields  a  linear  program  in 
which  all  constraints  are  inequalities. 

Finally,  we  can  convert  the  greater-than-or-equal-to  constraints  to  less-than-or- 
equal-to  constraints  by  multiplying  these  constraints  through  by  —1.  That  is,  any 
inequality  of  the  form 
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J^aijXj  >  bi 

.i= i 

is  equivalent  to 

n 

^  -afjXj  <  . 

.i= i 

Thus,  by  replacing  each  coefficient  aij  by  — and  each  value  b,  by  —bi,  we  obtain 
an  equivalent  less-than-or-equal-to  constraint. 

Finishing  our  example,  we  replace  the  equality  in  constraint  (29.22)  by  two  in- 


equalities,  obtaining 

maximize 

2x\ 

3xj 

+ 

3x" 

subject  to 

Xl 

■ 

x2 

— 

x2 

< 

7 

Xi 

+ 

x'2 

- 

X2 

> 

7 

(29.23) 

Xl 

- 

2x'2 

+ 

2x" 

< 

4 

Xl,  xl 

’  x" 

>’  a2 

> 

0  . 

Finally,  we 

negate  constraint  (29.23). 

For  consistency  in 

variable  names,  we  re- 

name  x'2  to  x2  and  x2 

to  x3,  obtaining 

the  standard  form 

maximize 

2xi 

3x2 

+ 

3x3 

(29.24) 

subject  to 

Xi 

■ 

x2 

— 

x3 

< 

7 

(29.25) 

-Xi 

- 

x2 

+ 

^3 

< 

-7 

(29.26) 

Xl 

- 

2x2 

+ 

2x3 

< 

4 

(29.27) 

Xi,X2,X3 

> 

0  . 

(29.28) 

Converting  linear  programs  into  slack  form 

To  efficiently  solve  a  1  i near  program  with  the  simplex  algorithm,  we  prefer  to  ex¬ 
press  it  in  a  form  in  which  some  of  the  constraints  are  equality  constraints.  More 
precisely,  we  shall  convert  it  into  a  form  in  which  the  nonnegativity  constraints  are 
the  only  inequality  constraints,  and  the  remaining  constraints  are  equalities.  Let 

n 

^  atjXj  <  bi 
j= i 


(29.29) 


29.1  Standard  and  slack  forms 


855 


be  an  inequality  constraint.  We  introduce  a  new  variable  s  and  rewrite  inequal¬ 
ity  (29.29)  as  the  two  constraints 

n 

s  =  bi  —  'Y2/aijXj  ,  (29.30) 

j= i 

5  >  0.  (29.31) 

We  call  s  a  slack  variable  because  it  measures  the  slack,  or  difference,  between 
the  left-hand  and  right-hand  sides  of  equation  (29.29).  (We  shall  soon  see  why  we 
find  it  convenient  to  write  the  constraint  with  only  the  slack  variable  on  the  left- 
hand  side.)  Because  inequality  (29.29)  is  true  if  and  only  if  both  equation  (29.30) 
and  inequality  (29.31)  are  true,  we  can  convert  each  inequality  constraint  of  a  lin¬ 
ear  program  in  this  way  to  obtain  an  equivalent  linear  program  in  which  the  only 
inequality  constraints  are  the  nonnegativity  constraints.  When  converting  from 
standard  to  slack  form,  we  shall  use  x„+i-  (instead  of  s )  to  denote  the  slack  variable 
associated  with  the  i  th  inequality.  The  i  th  constraint  is  therefore 

n 

x ,i . :  —  bj  ^  '  t / /  /  -V ;  ,  (29.32) 

j= i 

along  with  the  nonnegativity  constraint  x„+i-  >  0. 

By  converting  each  constraint  of  a  linear  program  in  standard  form,  we  obtain  a 
linear  program  in  a  different  form.  For  example,  for  the  linear  program  described 
in  (29.24)-(29.28),  we  introduce  slack  variables  x4,  xs,  and  x6,  obtaining 


maximize 

2xi 

- 

3x2 

+ 

3x3 

(29.33) 

subject  to 

x4  =  7 

Xi 

— 

x2 

+ 

*3 

(29.34) 

Ui 

II 

1 

-J 

+  Xi 

+ 

x2 

- 

*3 

(29.35) 

x6  =  4 

Xi 

+ 

2x2 

- 

2x3 

(29.36) 

X1,X2,X3,X4,Xs,X6 

> 

0 

(29.37) 

In  this  lineal"  program,  all  the  constraints  except 

for  the  nonnegativity  constraints 

are  equalities,  and  each  variable  is  subject  to  a  nonnegativity  constraint.  We  write 
each  equality  constraint  with  one  of  the  variables  on  the  left-hand  side  of  the  equal¬ 
ity  and  all  others  on  the  right-hand  side.  Furthermore,  each  equation  has  the  same 
set  of  variables  on  the  right-hand  side,  and  these  variables  are  also  the  only  ones 
that  appear  in  the  objective  function.  We  call  the  variables  on  the  left-hand  side  of 
the  equalities  basic  variables  and  those  on  the  right-hand  side  nonbasic  variables. 

For  lineal-  programs  that  satisfy  these  conditions,  we  shall  sometimes  omit  the 
words  “maximize”  and  “subject  to,”  as  well  as  the  explicit  nonnegativity  con¬ 
straints.  We  shall  also  use  the  variable  z  to  denote  the  value  of  the  objective  func- 
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tion.  We  call  the  resulting  format  slack  form.  If  we  write  the  linear  program  given 
in  (29.33)-(29.37)  in  slack  form,  we  obtain 


z  = 

2a'i 

3x2 

+ 

3x3 

(29.38) 

x4  = 

7 

Xi 

*2 

+ 

*3 

(29.39) 

*5  = 

-7 

+  X\ 

■ 

*2 

- 

*3 

(29.40) 

X6  = 

4 

X\ 

+ 

2x2 

- 

2x3  . 

(29.41) 

As  with  standard  form,  we  find  it  convenient  to  have  a  more  concise  notation 
for  describing  a  slack  form.  As  we  shall  see  in  Section  29.3,  the  sets  of  basic  and 
nonbasic  variables  will  change  as  the  simplex  algorithm  runs.  We  use  N  to  denote 
the  set  of  indices  of  the  nonbasic  variables  and  B  to  denote  the  set  of  indices  of 
the  basic  variables.  We  always  have  that  |  A |  =  n,  |5|  =  m,  and  N  U  B  = 
{ 1 , 2, . . . ,  n  +  m).  The  equations  are  indexed  by  the  entries  of  B,  and  the  variables 
on  the  right-hand  sides  are  indexed  by  the  entries  of  N.  As  in  standard  form,  we  use 
bi,  Cj,  and  ay  to  denote  constant  terms  and  coefficients.  We  also  use  v  to  denote 
an  optional  constant  term  in  the  objective  function.  (We  shall  see  a  little  later  that 
including  the  constant  term  in  the  objective  function  makes  it  easy  to  determine  the 
value  of  the  objective  function.)  Thus  we  can  concisely  define  a  slack  form  by  a 
tuple  (N,  B.  A,  b,  c,  v),  denoting  the  slack  form 

Z  =  v  +  7^  CjXj  (29.42) 

jeN 

Xi  =  bt  —  'Y^aijXj  for  /  eB  ,  (29.43) 

jeN 

in  which  all  variables  x  are  constrained  to  be  nonnegative.  Because  we  subtract 
the  sum  W-6iV  aijXj  in  (29.43),  the  values  ay  are  actually  the  negatives  of  the 
coefficients  as  they  “appeal-”  in  the  slack  form. 

For  example,  in  the  slack  form 


Z  = 

28 

*3 

6 

x5 

6 

2x6 

3 

Xj  = 

8 

X3 

+  T 

■ 

x5 

6 

*6 

3 

x2  = 

4 

1 

I  00 
u>  ^ 

1  w 

2x5 

x  + 

*6 

3 

x4  = 

18 

x3 

2 

■ 

x5 

2  ’ 

we  have  B  = 

{1,2,4},  N 

= 

{3,5,6}, 
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/  —1/6  -1/6  1/3  \ 

8/3  2/3  -1/3  , 

\  1/2  -1/2  o  / 


c  =  (  c3  c5  c6  )T  =  (—1/6  —1/6  — 2/3  )T,  and  v  =  28.  Note  that  the 
indices  into  A,  b,  and  c  are  not  necessarily  sets  of  contiguous  integers;  they  depend 
on  the  index  sets  B  and  N.  As  an  example  of  the  entries  of  A  being  the  negatives 
of  the  coefficients  as  they  appeal-  in  the  slack  form,  observe  that  the  equation  for  X\ 
includes  the  term  x3/6,  yet  the  coefficient  ai3  is  actually  —1/6  rather  than  +1/6. 


Exercises 


29.1-1 

If  we  express  the  linear  program  in  (29.24)-(29.28)  in  the  compact  notation  of 
(29. 19)— (29.21),  what  are  n,  m,  A,  b,  and  cl 


29.1-2 

Give  three  feasible  solutions  to  the  linear  program  in  (29.24)-(29.28).  What  is  the 
objective  value  of  each  one? 


29.1-3 

For  the  slack  form  in  (29.38)— (29.4 1),  what  are  N ,  B,  A,  b,  c,  and  vl 


29.1-4 

Convert  the  following  linear  program  into  standard  form: 

minimize  2x\  +  lx2  +  x3 

subject  to 

x  i  -  X3  =  7 

3xi  +  X2  +  24 

x2  >  0 

x3  <  0  . 
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29.1-5 

Convert  the  following  linear  program  into  slack  form: 

maximize 
subject  to 


2xi 

-  6x3 

xx 

+ 

X2 

X3 

< 

7 

3xi 

- 

X2 

> 

8 

— Xi 

+ 

2x2 

+  2x3 

> 

0 

Xi,X2,X3 

> 

0 

What  are  the  basic  and  nonbasic  variables? 


29.1-6 


Show  that  the  following  linear-  program  is  infeasible: 

maximize  3xi  —  lx2 

subject  to 

Xi  +  x2  —  2 

—2xi  —  2x2  <  —10 

xltx2  >  0  . 

29.1-7 

Show  that  the  following  linear  program  is  unbounded: 

maximize  X\  —  x2 
subject  to 

— 2x\  “I-  x2  <'  — 1 

— X\  —  2x2  <  —2 

X\,x2  >  0  . 


29.1-8 

Suppose  that  we  have  a  general  linear-  program  with  n  variables  and  m  constraints, 
and  suppose  that  we  convert  it  into  standard  form.  Give  an  upper  bound  on  the 
number  of  variables  and  constraints  in  the  resulting  linear  program. 


29.1-9 

Give  an  example  of  a  linear  program  for  which  the  feasible  region  is  not  bounded, 
but  the  optimal  objective  value  is  finite. 


29.2  Formulating  problems  as  linear  programs 
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29.2  Formulating  problems  as  linear  programs 

Although  we  shall  focus  on  the  simplex  algorithm  in  this  chapter,  it  is  also  impor¬ 
tant  to  be  able  to  recognize  when  we  can  formulate  a  problem  as  a  linear  program. 
Once  we  cast  a  problem  as  a  polynomial-sized  linear  program,  we  can  solve  it 
in  polynomial  time  by  the  ellipsoid  algorithm  or  interior-point  methods.  Several 
linear-programming  software  packages  can  solve  problems  efficiently,  so  that  once 
the  problem  is  in  the  form  of  a  linear  program,  such  a  package  can  solve  it. 

We  shall  look  at  several  concrete  examples  of  linear-programming  problems.  We 
staid  with  two  problems  that  we  have  already  studied:  the  single-source  shortest- 
paths  problem  (see  Chapter  24)  and  the  maximum-flow  problem  (see  Chapter  26). 
We  then  describe  the  minimum-cost-flow  problem.  Although  the  minimum-cost- 
flow  problem  has  a  polynomial-time  algorithm  that  is  not  based  on  1  i  near  program¬ 
ming,  we  won’t  describe  the  algorithm.  Finally,  we  describe  the  multicommodity- 
flow  problem,  for  which  the  only  known  polynomial-time  algorithm  is  based  on 
linear  programming. 

When  we  solved  graph  problems  in  Pail  VI,  we  used  attribute  notation,  such 
as  v.d  and  ( u,v).f .  Linear  programs  typically  use  subscripted  variables  rather 
than  objects  with  attached  attributes,  however.  Therefore,  when  we  express  vari¬ 
ables  in  linear  programs,  we  shall  indicate  vertices  and  edges  through  subscripts. 
For  example,  we  denote  the  shortest-path  weight  for  vertex  v  not  by  v.d  but  by  dv. 
Similarly,  we  denote  the  flow  from  vertex  u  to  vertex  v  not  by  (u.  v ) ./'  but  by  fuv. 
For  quantities  that  are  given  as  inputs  to  problems,  such  as  edge  weights  or  capac¬ 
ities,  we  shall  continue  to  use  notations  such  as  w(u,  v)  and  c(u.v). 

Shortest  paths 

We  can  formulate  the  single-source  shortest-paths  problem  as  a  linear  program. 
In  this  section,  we  shall  focus  on  how  to  formulate  the  single-pair  shortest-path 
problem,  leaving  the  extension  to  the  more  general  single-source  shortest-paths 
problem  as  Exercise  29.2-3. 

In  the  single-pair  shortest-path  problem,  we  are  given  a  weighted,  directed  graph 
G  =  (F,  E),  with  weight  function  w  :  E  — >•  M  mapping  edges  to  real-valued 
weights,  a  source  vertex  s,  and  destination  vertex  t.  We  wish  to  compute  the 
value  dt,  which  is  the  weight  of  a  shortest  path  from  s  to  t.  To  express  this  prob¬ 
lem  as  a  linear  program,  we  need  to  determine  a  set  of  variables  and  constraints  that 
define  when  we  have  a  shortest  path  from  s  to  t.  Fortunately,  the  Bellman-Ford  al¬ 
gorithm  does  exactly  this.  When  the  Bellman-Ford  algorithm  terminates,  it  has 
computed,  for  each  vertex  v,  a  value  dv  (using  subscript  notation  here  rather  than 
attribute  notation)  such  that  for  each  edge  {u,  v)  e  E,  we  have  dv  <  du  +  w ( a ,  v). 
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The  source  vertex  initially  receives  a  value  ds  =  0,  which  never  changes.  Thus 
we  obtain  the  following  linear  program  to  compute  the  shortest-path  weight  from  s 


to  t: 

maximize 

d, 

(29.44) 

subject  to 

dv  <  du  +  w(u,  v)  for  each  edge  (u,v)  e  E  , 

(29.45) 

ds  =  0 . 

(29.46) 

You  might  be  suiprised  that  this  linear  program  maximizes  an  objective  function 
when  it  is  supposed  to  compute  shortest  paths.  We  do  not  want  to  minimize  the 
objective  function,  since  then  setting  dv  =  0  for  all  i>  e  V  would  yield  an  optimal 
solution  to  the  linear  program  without  solving  the  shortest-paths  problem.  We 
maximize  because  an  optimal  solution  to  the  shortest-paths  problem  sets  each  dv 
to  minM.(M  V)S£:  { du  +  w(u,  v)},  so  that  dv  is  the  largest  value  that  is  less  than  or 
equal  to  all  of  the  values  in  the  set  { du  +  w ( u ,  u)  j .  We  want  to  maximize  dv 
for  all  vertices  v  on  a  shortest  path  from  s  to  t  subject  to  these  constraints  on  all 
vertices  v,  and  maximizing  d,  achieves  this  goal. 

This  lineal-  program  has  \  V\  variables  dv,  one  for  each  vertex  v  e  V.  It  also 
has  \E\  +  1  constraints:  one  for  each  edge,  plus  the  additional  constraint  that  the 
source  vertex’s  shortest-path  weight  always  has  the  value  0. 

Maximum  flow 

Next,  we  express  the  maximum-flow  problem  as  a  linear  program.  Recall  that  we 
are  given  a  directed  graph  G  =  {V,  E)  in  which  each  edge  (u,v)  €  E  has  a 
nonnegative  capacity  c(ii  ,  v)  >  0,  and  two  distinguished  vertices:  a  source  s  and 
a  sink  t.  As  defined  in  Section  26.1,  a  flow  is  a  nonnegative  real- valued  function 
/  :  V  x  V  —>■  R  that  satisfies  the  capacity  constraint  and  flow  conservation.  A 
maximum  flow  is  a  flow  that  satisfies  these  constraints  and  maximizes  the  flow 
value,  which  is  the  total  flow  coming  out  of  the  source  minus  the  total  flow  into  the 
source.  A  flow,  therefore,  satisfies  linear  constraints,  and  the  value  of  a  flow  is  a 
linear  function.  Recalling  also  that  we  assume  that  c(u,v)  =  0  if  (n,  v)  $  E  and 
that  there  are  no  antiparallel  edges,  we  can  express  the  maximum-flow  problem  as 
a  linear  program: 


maximize  fsv 

~  T,f; 

(29.47) 

veV 

veV 

subject  to 

fuv 

<  c(u,  v) 

for  each  u,v  e  V  , 

(29.48) 

y~!  fvu 

=  ^  fuv 

for  each  u  €  V  —  {s,t}  , 

(29.49) 

veV 

veV 

fuv 

>  0 

for  each  u,v  €  V  . 

(29.50) 
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This  lineal-  program  has  \V\2  variables,  corresponding  to  the  flow  between  each 
pair  of  vertices,  and  it  has  2  |  V\2  +  |  V\  —  2  constraints. 

It  is  usually  more  efficient  to  solve  a  smaller-sized  linear  program.  The  linear 
program  in  (29.47)-(29.50)  has,  for  ease  of  notation,  a  flow  and  capacity  of  0  for 
each  pair  of  vertices  u ,  v  with  (u.v)  $  E.  It  would  be  more  efficient  to  rewrite  the 
linear  program  so  that  it  has  0(V  +  E)  constraints.  Exercise  29.2-5  asks  you  to 
do  so. 

Minimum-cost  flow 

In  this  section,  we  have  used  linear  programming  to  solve  problems  for  which  we 
already  knew  efficient  algorithms.  In  fact,  an  efficient  algorithm  designed  specif¬ 
ically  for  a  problem,  such  as  Dijkstra’s  algorithm  for  the  single-source  shortest- 
paths  problem,  or  the  push-relabel  method  for  maximum  flow,  will  often  be  more 
efficient  than  linear  programming,  both  in  theory  and  in  practice. 

The  real  power  of  linear  programming  comes  from  the  ability  to  solve  new  prob¬ 
lems.  Recall  the  problem  faced  by  the  politician  in  the  beginning  of  this  chapter. 
The  problem  of  obtaining  a  sufficient  number  of  votes,  while  not  spending  too 
much  money,  is  not  solved  by  any  of  the  algorithms  that  we  have  studied  in  this 
book,  yet  we  can  solve  it  by  linear  programming.  Books  abound  with  such  real- 
world  problems  that  linear  programming  can  solve.  Linear  programming  is  also 
particularly  useful  for  solving  variants  of  problems  for  which  we  may  not  already 
know  of  an  efficient  algorithm. 

Consider,  for  example,  the  following  generalization  of  the  maximum-flow  prob¬ 
lem.  Suppose  that,  in  addition  to  a  capacity  c{u.v)  for  each  edge  (w,  v),  we  are 
given  areal-valued  cost  a(u,  v).  As  in  the  maximum-flow  problem,  we  assume  that 
c(u,v)  =  0  if  (u,  v)  ^  E,  and  that  there  are  no  antiparallel  edges.  If  we  send  fuv 
units  of  flow  over  edge  (n,  v),  we  incur  a  cost  of  a(u ,  v)fuv.  We  are  also  given  a 
flow  demand  d .  We  wish  to  send  d  units  of  flow  from  s  to  t  while  minimizing  the 
total  cost  E(«  v)€E  a(u’  v)  fuv  incurred  by  the  flow.  This  problem  is  known  as  the 
minimum-cost-flow  problem. 

Figure  29.3(a)  shows  an  example  of  the  minimum-cost-flow  problem.  We  wish 
to  send  4  units  of  flow  from  s  to  t  while  incurring  the  minimum  total  cost.  Any 
particular  legal  flow,  that  is,  a  function  /  satisfying  constraints  (29.48)-(29.49), 
incurs  a  total  cost  of  E(«  v)e£  a(u ,  v)  fuv.  We  wish  to  find  the  particular  4-unit 
flow  that  minimizes  this  cost.  Figure  29.3(b)  shows  an  optimal  solution,  with  total 
cost  E(M,V)6£  a(u,  v)fuv  =  (2  ■  2)  +  (5  ■  2)  +  (3  ■  1)  +  (7  ■  1)  +  (1  ■  3)  =  27. 

There  are  polynomial-time  algorithms  specifically  designed  for  the  minimum- 
cost-flow  problem,  but  they  are  beyond  the  scope  of  this  book.  We  can,  however, 
express  the  minimum-cost-flow  problem  as  a  linear  program.  The  linear  program 
looks  similar  to  the  one  for  the  maximum-flow  problem  with  the  additional  con- 
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Figure  293  (a)  An  example  of  a  minimum  cost  flow  problem.  We  denote  the  capacities  by  c  and 

the  costs  by  a.  Vertex  s  is  the  source  and  vertex  t  is  the  sink,  and  we  wish  to  send  4  units  of  flow 
from  s  to  t .  (b)  A  solution  to  the  minimum  cost  flow  problem  in  which  4  units  of  flow  are  sent  from  s 
to  t.  For  each  edge,  the  flow  and  capacity  are  written  as  flow/capacity. 


straint  that  the  value  of  the  flow  be  exactly  d  units,  and  with  the  new  objective 
function  of  minimizing  the  cost: 


minimize 

£  - 

(u,v)fuv 

(29.51) 

subject  to 

(u,v)eE 

fuv 

< 

c(u ,  v) 

for  each  u,  v  e  V  , 

£/». 

-£/.» 

— 

0 

for  each  u  e  V  —  {$,  t }  , 

veV 

veV 

£/~ 

-£  /» 

= 

d. 

veV 

veV 

fuv 

> 

0 

for  each  u,  v  €  V  . 

(29.52) 

Multicommodity  flow 

As  a  final  example,  we  consider  another  flow  problem.  Suppose  that  the  Lucky 
Puck  company  from  Section  26.1  decides  to  diversify  its  product  line  and  ship 
not  only  hockey  pucks,  but  also  hockey  sticks  and  hockey  helmets.  Each  piece  of 
equipment  is  manufactured  in  its  own  factory,  has  its  own  warehouse,  and  must 
be  shipped,  each  day,  from  factory  to  warehouse.  The  sticks  are  manufactured  in 
Vancouver  and  must  be  shipped  to  Saskatoon,  and  the  helmets  are  manufactured  in 
Edmonton  and  must  be  shipped  to  Regina.  The  capacity  of  the  shipping  network 
does  not  change,  however,  and  the  different  items,  or  commodities ,  must  share  the 
same  network. 

This  example  is  an  instance  of  a  multicommodity-flow  problem.  In  this  problem, 
we  are  again  given  a  directed  graph  G  =  (V.  E)  in  which  each  edge  (w,  v)  e  E 
has  a  nonnegative  capacity  c(w,  v)  >  0.  As  in  the  maximum-flow  problem,  we  im¬ 
plicitly  assume  that  c(w,  v)  =  0  for  (w,  v)  ^  E,  and  that  the  graph  has  no  antipar- 
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allel  edges.  In  addition,  we  are  given  k  different  commodities,  Ki,  K2, . . . ,  Kg, 
where  we  specify  commodity  i  by  the  triple  K,  —  (y ,  y,  dt ).  Here,  vertex  y  is 
the  source  of  commodity  i ,  vertex  l,  is  the  sink  of  commodity  i ,  and  dt  is  the  de¬ 
mand  for  commodity  i ,  which  is  the  desired  flow  value  for  the  commodity  from  y 
to  tj.  We  define  a  flow  for  commodity  i,  denoted  by  f,  (so  that  fuv  is  the  flow  of 
commodity  i  from  vertex  u  to  vertex  v)  to  be  a  real-valued  function  that  satisfies 
the  flow-conservation  and  capacity  constraints.  We  now  define  fuv,  the  aggregate 
flow,  to  be  the  sum  of  the  various  commodity  flows,  so  that  fuv  =  Xw=i  fiuv ■  The 
aggregate  flow  on  edge  ( u ,  v)  must  be  no  more  than  the  capacity  of  edge  (u,v). 
We  are  not  trying  to  minimize  any  objective  function  in  this  problem;  we  need 
only  determine  whether  such  a  flow  exists.  Thus,  we  write  a  linear  program  with  a 
“null”  objective  function: 

minimize  0 

subject  to 

k 


'y  '  fiuv 

<  c(u,v) 

for  each  u,  v  e  V  , 

J2f»v 

—  y  fivu 

=  0 

for  each  i  =  1,2, ...  ,k  and 

vsV 

veV 

for  each  u  e  V  —  {y ,  y  }  , 

fi-M-y  ~ 

■  E  Av* 

=  di 

for  each  i  =  1,2 , . . .  ,k  , 

veV 

veV 

fiuv 

>  0 

for  each  u.v  e  V  and 

for  each  i  =  1,2 , . . .  ,k  . 

The  only  known  polynomial-time  algorithm  for  this  problem  expresses  it  as  a  linear 
program  and  then  solves  it  with  a  polynomial-time  linear-programming  algorithm. 

Exercises 


29.2-1 

Put  the  single-pair  shortest-path  linear  program  from  (29.44)-(29.46)  into  standard 
form. 


29.2-2 

Write  out  explicitly  the  1  i near  program  corresponding  to  finding  the  shortest  path 
from  node  s  to  node  y  in  Figure  24.2(a). 


29.2-3 

In  the  single-source  shortest-paths  problem,  we  want  to  find  the  shortest-path 
weights  from  a  source  vertex  s  to  all  vertices  v  e  V.  Given  a  graph  G,  write  a 
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lineal-  program  for  which  the  solution  has  the  property  that  dv  is  the  shortest-path 
weight  from  ,v  to  v  for  each  vertex  v  e  V. 

29.2-4 

Write  out  explicitly  the  linear  program  corresponding  to  finding  the  maximum  flow 
in  Figure  26.1(a). 


29.2- 5 

Rewrite  the  linear  program  for  maximum  flow  (29.47)-(29.50)  so  that  it  uses  only 
0(V  +  E)  constraints. 

29.2- 6 

Write  a  linear  program  that,  given  a  bipartite  graph  G  =  (V,  E),  solves  the  maxi- 
mum-bipartite-matching  problem. 


29.2-7 

In  the  minimum-cost  multicommodity-flow  problem ,  we  are  given  directed  graph 
G  =  (F,  E)  in  which  each  edge  (u.v)  €  E  has  a  nonnegative  capacity  c(u,  v)  >  0 
and  a  cost  a(u,  v).  As  in  the  multicommodity-flow  problem,  we  are  given  k  dif¬ 
ferent  commodities,  Ki,  K2, . . . ,  Kg,  where  we  specify  commodity  i  by  the  triple 
Kj  =  (Si ,  ti ,  d, ) .  We  define  the  flow  f]  for  commodity  i  and  the  aggregate  flow  fuv 
on  edge  (u,  v)  as  in  the  multicommodity-flow  problem.  A  feasible  flow  is  one 
in  which  the  aggregate  flow  on  each  edge  ( u ,  v  )  is  no  more  than  the  capacity  of 
edge  (u,  v ).  The  cost  of  a  flow  is  u  vev  a(u’  v)  fuv,  and  the  goal  is  to  find  the 
feasible  flow  of  minimum  cost.  Express  this  problem  as  a  linear  program. 


29.3  The  simplex  algorithm 

The  simplex  algorithm  is  the  classical  method  for  solving  linear  programs.  In  con¬ 
trast  to  most  of  the  other  algorithms  in  this  book,  its  running  time  is  not  polynomial 
in  the  worst  case.  It  does  yield  insight  into  linear  programs,  however,  and  is  often 
remarkably  fast  in  practice. 

In  addition  to  having  a  geometric  interpretation,  described  earlier  in  this  chapter, 
the  simplex  algorithm  bears  some  similarity  to  Gaussian  elimination,  discussed  in 
Section  28.1.  Gaussian  elimination  begins  with  a  system  of  linear  equalities  whose 
solution  is  unknown.  In  each  iteration,  we  rewrite  this  system  in  an  equivalent 
form  that  has  some  additional  structure.  After  some  number  of  iterations,  we  have 
rewritten  the  system  so  that  the  solution  is  simple  to  obtain.  The  simplex  algo¬ 
rithm  proceeds  in  a  similar  manner,  and  we  can  view  it  as  Gaussian  elimination  for 
inequalities. 
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We  now  describe  the  main  idea  behind  an  iteration  of  the  simplex  algorithm. 
Associated  with  each  iteration  will  be  a  “basic  solution”  that  we  can  easily  obtain 
from  the  slack  form  of  the  linear  program:  set  each  nonbasic  variable  to  0  and 
compute  the  values  of  the  basic  variables  from  the  equality  constraints.  An  iteration 
converts  one  slack  form  into  an  equivalent  slack  form.  The  objective  value  of  the 
associated  basic  feasible  solution  will  be  no  less  than  that  at  the  previous  iteration, 
and  usually  greater.  To  achieve  this  increase  in  the  objective  value,  we  choose  a 
nonbasic  variable  such  that  if  we  were  to  increase  that  variable’s  value  from  0,  then 
the  objective  value  would  increase,  too.  The  amount  by  which  we  can  increase 
the  variable  is  limited  by  the  other  constraints.  In  particular-,  we  raise  it  until  some 
basic  variable  becomes  0.  We  then  rewrite  the  slack  form,  exchanging  the  roles 
of  that  basic  variable  and  the  chosen  nonbasic  variable.  Although  we  have  used  a 
particular  setting  of  the  variables  to  guide  the  algorithm,  and  we  shall  use  it  in  our 
proofs,  the  algorithm  does  not  explicitly  maintain  this  solution.  It  simply  rewrites 
the  linear-  program  until  an  optimal  solution  becomes  “obvious.” 

An  example  of  the  simplex  algorithm 

We  begin  with  an  extended  example.  Consider  the  following  linear  program  in 


standard  form: 

maximize  3x3  +  x2  +  2x3  (29.53) 

subject  to 

X\  +  x2  +  3x3  <  30  (29.54) 

2xi  +  2x2  +  5x3  <  24  (29.55) 

4xi  T  x2  T  2x3  T  36  (29.56) 

x1,x2,x3  >  0  .  (29.57) 


In  order  to  use  the  simplex  algorithm,  we  must  convert  the  linear  program  into 
slack  form;  we  saw  how  to  do  so  in  Section  29.1.  In  addition  to  being  an  algebraic 
manipulation,  slack  is  a  useful  algorithmic  concept.  Recalling  from  Section  29. 1 
that  each  variable  has  a  corresponding  nonnegativity  constraint,  we  say  that  an 
equality  constraint  is  tight  for  a  particular  setting  of  its  nonbasic  variables  if  they 
cause  the  constraint’s  basic  variable  to  become  0.  Similarly,  a  setting  of  the  non¬ 
basic  variables  that  would  make  a  basic  variable  become  negative  violates  that 
constraint.  Thus,  the  slack  variables  explicitly  maintain  how  far  each  constraint  is 
from  being  tight,  and  so  they  help  to  determine  how  much  we  can  increase  values 
of  nonbasic  variables  without  violating  any  constraints. 

Associating  the  slack  variables  x4,  xs,  and  x6  with  inequalities  (29.54)-(29.56), 
respectively,  and  putting  the  linear  program  into  slack  form,  we  obtain 
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z 

= 

3xi 

+  x2 

+  2x3 

(29.58) 

x4 

=  30 

-  X1 

-  X2 

-  3x3 

(29.59) 

*5 

=  24 

—  2xi 

—  2x2 

-  5x3 

(29.60) 

x6 

=  36 

—  4xi 

X2 

-  2x3  . 

(29.61) 

The  system  of  constraints  (29.59)-(29.61)  has  3  equations  and  6  variables.  Any 
setting  of  the  variables  X\,  x2,  and  x3  defines  values  for  x4,  x5,  and  x6 ;  therefore, 
we  have  an  infinite  number  of  solutions  to  this  system  of  equations.  A  solution  is 
feasible  if  all  of  X| ,  x2, . . . ,  x6  are  nonnegative,  and  there  can  be  an  infinite  num¬ 
ber  of  feasible  solutions  as  well.  The  infinite  number  of  possible  solutions  to  a 
system  such  as  this  one  will  be  useful  in  later  proofs.  We  focus  on  the  basic  solu¬ 
tion :  set  all  the  (nonbasic)  variables  on  the  right-hand  side  to  0  and  then  compute 
the  values  of  the  (basic)  variables  on  the  left-hand  side.  In  this  example,  the  ba¬ 
sic  solution  is  (xi,  x2,  .  .  . ,  xf)  =  (0,0,0,30,24,36)  and  it  has  objective  value 
Z  =  (3  ■  0)  +  (1  •  0)  +  (2  ■  0)  =  0.  Observe  that  this  basic  solution  sets  x,-  =  bj 
for  each  i  €  B.  An  iteration  of  the  simplex  algorithm  rewrites  the  set  of  equations 
and  the  objective  function  so  as  to  put  a  different  set  of  variables  on  the  right- 
hand  side.  Thus,  a  different  basic  solution  is  associated  with  the  rewritten  problem. 
We  emphasize  that  the  rewrite  does  not  in  any  way  change  the  underlying  linear- 
programming  problem;  the  problem  at  one  iteration  has  the  identical  set  of  feasible 
solutions  as  the  problem  at  the  previous  iteration.  The  problem  does,  however, 
have  a  different  basic  solution  than  that  of  the  previous  iteration. 

If  a  basic  solution  is  also  feasible,  we  call  it  a  basic  feasible  solution.  As  we  run 
the  simplex  algorithm,  the  basic  solution  is  almost  always  a  basic  feasible  solution. 
We  shall  see  in  Section  29.5,  however,  that  for  the  first  few  iterations  of  the  simplex 
algorithm,  the  basic  solution  might  not  be  feasible. 

Our  goal,  in  each  iteration,  is  to  reformulate  the  linear  program  so  that  the  basic 
solution  has  a  greater  objective  value.  We  select  a  nonbasic  variable  xe  whose 
coefficient  in  the  objective  function  is  positive,  and  we  increase  the  value  of  xe  as 
much  as  possible  without  violating  any  of  the  constraints.  The  variable  xe  becomes 
basic,  and  some  other  variable  X/  becomes  nonbasic.  The  values  of  other  basic 
variables  and  of  the  objective  function  may  also  change. 

To  continue  the  example,  let’s  think  about  increasing  the  value  of  X\.  As  we 
increase  x1?  the  values  of  x4,  x5,  and  x6  all  decrease.  Because  we  have  a  nonnega¬ 
tivity  constraint  for  each  variable,  we  cannot  allow  any  of  them  to  become  negative. 
If  X!  increases  above  30,  then  x4  becomes  negative,  and  x5  and  x6  become  nega¬ 
tive  when  Xi  increases  above  12  and  9,  respectively.  The  third  constraint  (29.61)  is 
the  tightest  constraint,  and  it  limits  how  much  we  can  increase  X\.  Therefore,  we 
switch  the  roles  of  X\  and  x6.  We  solve  equation  (29.61)  for  X\  and  obtain 
X2  *3  x6 

T  ~~  ~2  -  T  ' 


Xi  =  9 


(29.62) 
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To  rewrite  the  other  equations  with  x6  on  the  right-hand  side,  we  substitute  for  x, 
using  equation  (29.62).  Doing  so  for  equation  (29.59),  we  obtain 


x4  =  30  —  Xi  —  x2  —  3x3 

=  30- (9- —  -  —  -  — ) -x2-3x3 
V  4  2  4  /  2 

3x2  5x3  x6 


Similarly,  we  combine  equation  (29.62)  with  constraint  (29.60)  and  with  objective 
function  (29.58)  to  rewrite  our  linear  program  in  the  following  form: 


z 

=  27 

+  7 

+  ! 

3x6 

4 

(29.64) 

Xi 

=  9 

X2 

4 

*3 

2 

*6 

4 

(29.65) 

x4 

=  21 

3x2 

— 

5x3 

2 

■ 

*6 

4 

(29.66) 

Xs 

=  6 

3x2 

T 

-  4x3 

+ 

*6 

2  ' 

(29.67) 

We  call  this  operation  a  pivot.  As  demonstrated  above,  a  pivot  chooses  a  nonbasic 
variable  xe,  called  the  entering  variable ,  and  a  basic  variable  X/,  called  the  leaving 
variable ,  and  exchanges  their  roles. 

The  linear  program  described  in  equations  (29.64)-(29.67)  is  equivalent  to  the 
linear  program  described  in  equations  (29.58)-(29.61).  We  perform  two  operations 
in  the  simplex  algorithm:  rewrite  equations  so  that  variables  move  between  the  left- 
hand  side  and  the  right-hand  side,  and  substitute  one  equation  into  another.  The  first 
operation  trivially  creates  an  equivalent  problem,  and  the  second,  by  elementary 
linear  algebra,  also  creates  an  equivalent  problem.  (See  Exercise  29.3-3.) 

To  demonstrate  this  equivalence,  observe  that  our  original  basic  solution  (0, 0, 
0,  30, 24,  36)  satisfies  the  new  equations  (29.65)-(29.67)  and  has  objective  value 
27  H-  (1/4)  -  0  +  (1/2)  -  0  —  (3/4)  -  36  =  0.  The  basic  solution  associated  with  the 
new  lineal-  program  sets  the  nonbasic  values  to  0  and  is  (9, 0,  0, 21, 6, 0),  with  ob¬ 
jective  value  z  =  27.  Simple  arithmetic  verifies  that  this  solution  also  satisfies 
equations  (29.59)— (29.6 1 )  and,  when  plugged  into  objective  function  (29.58),  has 
objective  value  (3  -  9)  +  (1  ■  0)  +  (2  ■  0)  =  27. 

Continuing  the  example,  we  wish  to  find  a  new  variable  whose  value  we  wish  to 
increase.  We  do  not  want  to  increase  x6,  since  as  its  value  increases,  the  objective 
value  decreases.  We  can  attempt  to  increase  either  x2  or  x3;  let  us  choose  x3.  How 
far  can  we  increase  x3  without  violating  any  of  the  constraints?  Constraint  (29.65) 
limits  it  to  18,  constraint  (29.66)  limits  it  to  42/5,  and  constraint  (29.67)  limits 
it  to  3/2.  The  third  constraint  is  again  the  tightest  one,  and  therefore  we  rewrite 
the  third  constraint  so  that  x3  is  on  the  left-hand  side  and  x5  is  on  the  right-hand 
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side.  We  then  substitute  this  new  equation,  x3  =  'ill  —  3x2/8  —  x5/4  +  x6/8,  into 
equations  (29.64)-(29.66)  and  obtain  the  new,  but  equivalent,  system 


z  = 

111 

T 

+ 

x2 

16 

- 

*5 

8 

llX6 

16 

(29.68) 

Xi  = 

33 

T 

- 

x2 

16 

+ 

*5 

8 

5x6 

16 

(29.69) 

*3  = 

3 

2 

- 

3x2 

T" 

- 

xs 

4 

+  T 

(29.70) 

x4  = 

69 

T 

+ 

3x2 

~\6 

+ 

5xs 

IT 

x6 

16 

(29.71) 

This  system  has  the  associated  basic  solution  (33/4,0,3/2,69/4,0,0),  with  ob¬ 
jective  value  111/4.  Now  the  only  way  to  increase  the  objective  value  is  to  in¬ 
crease  x2.  The  three  constraints  give  upper  bounds  of  132,  4,  and  oo,  respectively. 
(We  get  an  upper  bound  of  oo  from  constraint  (29.71)  because,  as  we  increase  x2, 
the  value  of  the  basic  variable  x4  increases  also.  This  constraint,  therefore,  places 
no  restriction  on  how  much  we  can  increase  x2.)  We  increase  x2  to  4,  and  it  be¬ 
comes  nonbasic.  Then  we  solve  equation  (29.70)  for  x2  and  substitute  in  the  other 
equations  to  obtain 


Z  = 

28 

x3 

6 

x5 

6 

2x6 

3 

(29.72) 

Xl  = 

8 

+  f 

■ 

x5 

6 

x6 

3 

(29.73) 

x2  = 

4 

8x3 

2x5 

~~3~~ 

Xg 

+  3 

(29.74) 

x4  = 

18 

x3 

2 

■ 

Xs 

2 

(29.75) 

At  this  point,  all  coefficients  in  the  objective  function  are  negative.  As  we  shall  see 
later  in  this  chapter,  this  situation  occurs  only  when  we  have  rewritten  the  linear 
program  so  that  the  basic  solution  is  an  optimal  solution.  Thus,  for  this  problem, 
the  solution  (8,4,0,  18,0,0),  with  objective  value  28,  is  optimal.  We  can  now 
return  to  our  original  1  i near  program  given  in  (29.53)-(29.57).  The  only  variables 
in  the  original  linear  program  are  x3,  x2,  and  x3,  and  so  our  solution  is  Xi  =  8, 
x2  =  4,  and  x3  =  0,  with  objective  value  (3  ■  8)  +  ( 1  ■  4)  +  (2  ■  0)  =  28.  Note 
that  the  values  of  the  slack  variables  in  the  final  solution  measure  how  much  slack 
remains  in  each  inequality.  Slack  variable  x4  is  18,  and  in  inequality  (29.54),  the 
left-hand  side,  with  value  8  +  4  +  0=  12,  is  18  less  than  the  right-hand  side  of  30. 
Slack  variables  x5  and  x6  are  0  and  indeed,  in  inequalities  (29.55)  and  (29.56), 
the  left-hand  and  right-hand  sides  are  equal.  Observe  also  that  even  though  the 
coefficients  in  the  original  slack  form  are  integral,  the  coefficients  in  the  other 
lineal-  programs  are  not  necessarily  integral,  and  the  intermediate  solutions  are  not 
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necessarily  integral.  Furthermore,  the  final  solution  to  a  linear-  program  need  not 
be  integral;  it  is  purely  coincidental  that  this  example  has  an  integral  solution. 

Pivoting 

We  now  formalize  the  procedure  for  pivoting.  The  procedure  PIVOT  takes  as  in¬ 
put  a  slack  form,  given  by  the  tuple  (N,  B.  A.b,c,v),  the  index  /  of  the  leav¬ 
ing  variable  X/,  and  the  index  e  of  the  entering  variable  xe.  It  returns  the  tuple 
(N ,  B ,  A,  b,c,v)  describing  the  new  slack  form.  (Recall  again  that  the  entries  of 
the  m  x  n  matrices  A  and  A  are  actually  the  negatives  of  the  coefficients  that  appear 
in  the  slack  form.) 

Pivot(W  B.  A,  b ,  c,  v,  /,  e) 

1  //  Compute  the  coefficients  of  the  equation  for  new  basic  variable  xe. 

2  let  A  be  a  new  m  x  n  matrix 

3  be  =  bi/aie 

4  for  each  j  e  N  —  { e } 

5  Cl  ej  —  aij/au 

6  a  d  =  1  /die 

7  //  Compute  the  coefficients  of  the  remaining  constraints. 

8  for  each  i  e  B  —  {/ } 

9  bi  =  bj  —  Cljebe 

10  for  each  j  e  N  —  {e} 

11  1 1  ij  —  d[j  CljeClej 

12  dj  i  —  died  el 

13  //  Compute  the  objective  function. 

14  v  =  v  +  cebe 

15  for  each  j  e  N  —  {e} 

16  Cj  =  cj  —  cedej 

17  Cl  —  Cedel 

18  //  Compute  new  sets  of  basic  and  nonbasic  variables. 

19  N  =  N  -{e}  U  {/} 

20  B  =  B  -  {/}  U  {e} 

21  return  (N ,  B.  A,  b,  c,  v) 

Pivot  works  as  follows.  Lines  3-6  compute  the  coefficients  in  the  new  equation 
for  xe  by  rewriting  the  equation  that  has  X/  on  the  left-hand  side  to  instead  have  xe 
on  the  left-hand  side.  Lines  8-12  update  the  remaining  equations  by  substituting 
the  right-hand  side  of  this  new  equation  for  each  occurrence  of  xe.  Lines  14-17 
do  the  same  substitution  for  the  objective  function,  and  lines  19  and  20  update  the 
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sets  of  nonbasic  and  basic  variables.  Line  21  returns  the  new  slack  form.  As  given, 
if  (lie  =  0,  Pivot  would  cause  an  error  by  dividing  by  0,  but  as  we  shall  see  in  the 
proofs  of  Lemmas  29.2  and  29.12,  we  call  PIVOT  only  when  aje  f  0. 

We  now  summarize  the  effect  that  PIVOT  has  on  the  values  of  the  variables  in 
the  basic  solution. 

Lemma  29.1 

Consider  a  call  to  Pivot(A,  B ,  A.  b ,  c,  v,  /,  e )  in  which  aie  ^  0.  Let  the  values 
returned  from  the  call  be  (N ,  B ,  A,  b,c,v),  and  let  x  denote  the  basic  solution  after 
the  call.  Then 

1.  Xj  —  0  for  each  j  e  N. 

2.  =  bi/aie. 

3.  Xi  =  bi  —  aiebe  for  each  i  e  B  —  {e}. 

Proof  The  first  statement  is  true  because  the  basic  solution  always  sets  all  non¬ 
basic  variables  to  0.  When  we  set  each  nonbasic  variable  to  0  in  a  constraint 


jeN 


we  have  that  x,-  =  b,  for  each  i  e  B.  Since  e  e  B,  line  3  of  PIVOT  gives 
xe  =  be  =  bi/aie  , 

which  proves  the  second  statement.  Similarly,  using  line  9  for  each  i  €  B  —  {e}, 
we  have 


Xi  =  bi  =  bi  -  aiebe  , 

which  proves  the  third  statement. 


The  formal  simplex  algorithm 

We  are  now  ready  to  formalize  the  simplex  algorithm,  which  we  demonstrated  by 
example.  That  example  was  a  particularly  nice  one,  and  we  could  have  had  several 
other  issues  to  address: 

•  How  do  we  determine  whether  a  linear  program  is  feasible? 

•  What  do  we  do  if  the  linear  program  is  feasible,  but  the  initial  basic  solution  is 
not  feasible? 

•  How  do  we  determine  whether  a  linear  program  is  unbounded? 

•  How  do  we  choose  the  entering  and  leaving  variables? 
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In  Section  29.5,  we  shall  show  how  to  determine  whether  a  problem  is  feasible, 
and  if  so,  how  to  find  a  slack  form  in  which  the  initial  basic  solution  is  feasible. 
Therefore,  let  us  assume  that  we  have  a  procedure  lNlTlALlZE-SlMPLEX(ri,  b,  c) 
that  takes  as  input  a  linear  program  in  standard  form,  that  is,  an  m  x  n  matrix 
A  =  (i fly ),  an  in -vector  b  =  (&,•),  and  an  /? -vector  c  =  (cj).  If  the  problem  is 
infeasible,  the  procedure  returns  a  message  that  the  program  is  infeasible  and  then 
terminates.  Otherwise,  the  procedure  returns  a  slack  form  for  which  the  initial 
basic  solution  is  feasible. 

The  procedure  Simplex  takes  as  input  a  linear  program  in  standard  form,  as  just 
described.  It  returns  an  n -vector  x  =  (xy)  that  is  an  optimal  solution  to  the  linear 
program  described  in  (29. 19)— (29.21). 

Simplex(A,  b.  c) 

1  (N.  B.  A.b.c,v)  =  Initialize-Simplex(A,Zlc) 

2  let  A  be  a  new  vector  of  length  n 

3  while  some  index  j  €  N  has  Cj  >  0 

4  choose  an  index  e  e  N  for  which  ce  >  0 

5  for  each  index  i  e  B 

6  if  die  >  0 

7  A  i  —  bi/ttie 

8  else  A,-  =  oo 

9  choose  an  index  l  €  B  that  minimizes  A, 

10  if  A;  ==  oo 

1 1  return  “unbounded” 

12  else  (A,  B.  A,  b,  c,  v)  =  Pivot(A^,  B,  A,  b,  c,  v, /,  e) 

1 3  for  i  =  1  to  n 

14  if  ieB 

15  Xj  =  bj 

1 6  else  Xj  =  0 

17  return  (xi,x2, . . . , xn ) 


The  Simplex  procedure  works  as  follows.  In  line  1,  it  calls  the  procedure 
Initialize-Simplex (zl,  b,  c),  described  above,  which  either  determines  that  the 
linear  program  is  infeasible  or  returns  a  slack  form  for  which  the  basic  solution  is 
feasible.  The  while  loop  of  lines  3-12  forms  the  main  pail  of  the  algorithm.  If  all 
coefficients  in  the  objective  function  are  negative,  then  the  while  loop  terminates. 
Otherwise,  line  4  selects  a  variable  xe,  whose  coefficient  in  the  objective  function 
is  positive,  as  the  entering  variable.  Although  we  may  choose  any  such  variable  as 
the  entering  variable,  we  assume  that  we  use  some  prespecified  deterministic  rule. 
Next,  lines  5-9  check  each  constraint  and  pick  the  one  that  most  severely  limits 
the  amount  by  which  we  can  increase  xe  without  violating  any  of  the  nonnegativ- 
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ity  constraints;  the  basic  variable  associated  with  this  constraint  is  X/.  Again,  we 
are  free  to  choose  one  of  several  variables  as  the  leaving  variable,  but  we  assume 
that  we  use  some  prespecilied  deterministic  rule.  If  none  of  the  constraints  lim¬ 
its  the  amount  by  which  the  entering  variable  can  increase,  the  algorithm  returns 
“unbounded”  in  line  11.  Otherwise,  line  12  exchanges  the  roles  of  the  entering 
and  leaving  variables  by  calling  Pivot(A,  B ,  A,  b,  c,  v,  /,  e),  as  described  above. 
Lines  13-16  compute  a  solution  x1;  x2, . . . ,  x„  for  the  original  linear-programming 
variables  by  setting  all  the  nonbasic  variables  to  0  and  each  basic  variable  x,  to  6, , 
and  line  17  returns  these  values. 

To  show  that  Simplex  is  correct,  we  first  show  that  if  Simplex  has  an  initial 
feasible  solution  and  eventually  terminates,  then  it  either  returns  a  feasible  solution 
or  determines  that  the  linear  program  is  unbounded.  Then,  we  show  that  Simplex 
terminates.  Finally,  in  Section  29.4  (Theorem  29.10)  we  show  that  the  solution 
returned  is  optimal. 

Lemma  29.2 

Given  a  linear-  program  ( A ,  b,  c ),  suppose  that  the  call  to  Initialize-Simplex  in 
line  1  of  Simplex  returns  a  slack  form  for  which  the  basic  solution  is  feasible. 
Then  if  Simplex  returns  a  solution  in  line  17,  that  solution  is  a  feasible  solution  to 
the  linear  program.  If  Simplex  returns  “unbounded”  in  line  1 1,  the  linear  program 
is  unbounded. 

Proof  We  use  the  following  three-part  loop  invariant: 

At  the  star!  of  each  iteration  of  the  while  loop  of  lines  3-12, 

1.  the  slack  form  is  equivalent  to  the  slack  form  returned  by  the  call  of 
Initialize-Simplex, 

2.  for  each  i  €  B,  we  have  /),  >  0,  and 

3.  the  basic  solution  associated  with  the  slack  form  is  feasible. 

Initialization:  The  equivalence  of  the  slack  forms  is  trivial  for  the  first  itera¬ 
tion.  We  assume,  in  the  statement  of  the  lemma,  that  the  call  to  Initialize- 
S implex  in  line  1  of  Simplex  returns  a  slack  form  for  which  the  basic  solution 
is  feasible.  Thus,  the  third  part  of  the  invariant  is  true.  Because  the  basic  so¬ 
lution  is  feasible,  each  basic  variable  x,  is  nonnegative.  Furthermore,  since  the 
basic  solution  sets  each  basic  variable  x,  to  /),,  we  have  that  b,  >  0  for  all 
i  €  B.  Thus,  the  second  pari  of  the  invariant  holds. 

Maintenance:  We  shall  show  that  each  iteration  of  the  while  loop  maintains  the 
loop  invariant,  assuming  that  the  return  statement  in  line  1 1  does  not  execute. 
We  shall  handle  the  case  in  which  line  1 1  executes  when  we  discuss  termination. 
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An  iteration  of  the  while  loop  exchanges  the  role  of  a  basic  and  a  nonbasic 
variable  by  calling  the  PIVOT  procedure.  By  Exercise  29.3-3,  the  slack  form  is 
equivalent  to  the  one  from  the  previous  iteration  which,  by  the  loop  invariant, 
is  equivalent  to  the  initial  slack  form. 

We  now  demonstrate  the  second  part  of  the  loop  invariant.  We  assume  that  at 
the  staid  of  each  iteration  of  the  while  loop,  bj  >  0  for  each  i  €  B,  and  we  shall 
show  that  these  inequalities  remain  true  after  the  call  to  PIVOT  in  line  12.  Since 
the  only  changes  to  the  variables  b,  and  the  set  B  of  basic  variables  occur  in  this 
assignment,  it  suffices  to  show  that  line  12  maintains  this  part  of  the  invariant. 
We  let  bi,  cijj,  and  B  refer  to  values  before  the  call  of  PIVOT,  and  bj  refer  to 
values  returned  from  PIVOT. 

First,  we  observe  that  be  >  0  because  bi  >  0  by  the  loop  invariant,  aie  >  0  by 
lines  6  and  9  of  Simplex,  and  be  =  bi/aie  by  line  3  of  Pivot. 

For  the  remaining  indices  i  €  B  —  {/},  we  have  that 

bi  =  b{  —  aiebe  (by  line  9  of  Pivot) 

=  bj  —  aie(bi/aie)  (by  line  3  of  Pivot)  .  (29.76) 

We  have  two  cases  to  consider,  depending  on  whether  aie  >  0  or  aie  <  0. 
If  aie  >  0,  then  since  we  chose  l  such  that 

bi/aie  <  bi/aie  for  all  /  e  B  ,  (29.77) 

we  have 

bi  =  bi  —  aie(bilaie)  (by  equation  (29.76)) 

>  bi  —  aie(bi/aie)  (by  inequality  (29.77)) 

=  bi  -  bi 

=  0 , 

and  thus  ht  >  0.  If  aie  <  0,  then  because  aie,  bj,  and  hi  are  all  nonnegative, 
equation  (29.76)  implies  that  b,  must  be  nonnegative,  too. 

We  now  argue  that  the  basic  solution  is  feasible,  i.e.,  that  all  variables  have  non¬ 
negative  values.  The  nonbasic  variables  are  set  to  0  and  thus  are  nonnegative. 
Each  basic  variable  x,-  is  defined  by  the  equation 

Xi  =  bj  —  'Y^ajjXj  . 
jeN 

The  basic  solution  sets  jq-  =  bj.  Using  the  second  part  of  the  loop  invariant,  we 
conclude  that  each  basic  variable  x,  is  nonnegative. 


874 


Chapter  29  Linear  Programming 


Termination:  The  while  loop  can  terminate  in  one  of  two  ways.  If  it  terminates 
because  of  the  condition  in  line  3,  then  the  current  basic  solution  is  feasible  and 
line  17  returns  this  solution.  The  other  way  it  terminates  is  by  returning  “un¬ 
bounded”  in  line  1 1.  In  this  case,  for  each  iteration  of  the  for  loop  in  lines  5-8, 
when  line  6  is  executed,  we  find  that  aie  <  0.  Consider  the  solution  x  defined  as 

oo  if  i  =  e  , 

0  if  ieN-{e}, 

bi  ~  Zjcn  aH*j  if  i  ^  B  . 

We  now  show  that  this  solution  is  feasible,  i.e.,  that  all  variables  are  nonneg¬ 
ative.  The  nonbasic  variables  other  than  xe  are  0,  and  xe  =  oo  >  0;  thus  all 
nonbasic  variables  are  nonnegative.  For  each  basic  variable  X{,  we  have 

Xi  =  bi-^ajjXj 
jeN 

The  loop  invariant  implies  that  hi  >  0,  and  we  have  ate  <  o  and  xe  =  oo  >  0. 
Thus,  ^  >  0. 

Now  we  show  that  the  objective  value  for  the  solution  x  is  unbounded.  From 
equation  (29.42),  the  objective  value  is 

Z  =  V  +  J2  cj*J 

jeN 

=  V  +  cexe  . 

Since  ce  >  0  (by  line  4  of  Simplex)  and  xe  =  oo,  the  objective  value  is  oo, 
and  thus  the  linear  program  is  unbounded.  ■ 

It  remains  to  show  that  Simplex  terminates,  and  when  it  does  terminate,  the 
solution  it  returns  is  optimal.  Section  29.4  will  address  optimality.  We  now  discuss 
termination. 

Termination 

In  the  example  given  in  the  beginning  of  this  section,  each  iteration  of  the  simplex 
algorithm  increased  the  objective  value  associated  with  the  basic  solution.  As  Ex¬ 
ercise  29.3-2  asks  you  to  show,  no  iteration  of  Simplex  can  decrease  the  objective 
value  associated  with  the  basic  solution.  Unfortunately,  it  is  possible  that  an  itera¬ 
tion  leaves  the  objective  value  unchanged.  This  phenomenon  is  called  degeneracy , 
and  we  shall  now  study  it  in  greater  detail. 
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The  assignment  in  line  14  of  PIVOT,  v  =  v  +  cebe,  changes  the  objective  value. 
Since  Simplex  calls  Pivot  only  when  ce  >  0,  the  only  way  for  the  objective 
value  to  remain  unchanged  (i.e.,  v  =  v)  is  for  be  to  be  0.  This  value  is  assigned 
as  be  =  bi/aie  in  line  3  of  PIVOT.  Since  we  always  call  PIVOT  with  aie  ^  0,  we 
see  that  for  be  to  equal  0,  and  hence  the  objective  value  to  be  unchanged,  we  must 
have  bi  =  0. 

Indeed,  this  situation  can  occur.  Consider  the  linear  program 

Z  —  A'l  +  X2  +  X3 

x4  =  8  —  Xi  —  x2 

x5  =  x2  -  x3  . 

Suppose  that  we  choose  Xi  as  the  entering  variable  and  x4  as  the  leaving  variable. 
After  pivoting,  we  obtain 

z  =  8  +  x3  -  x4 

X\  =  8  —  x2  —  x4 

X5  =  x2  -  x3  . 

At  this  point,  our  only  choice  is  to  pivot  with  x3  entering  and  x5  leaving.  Since 
b5  =  0,  the  objective  value  of  8  remains  unchanged  after  pivoting: 


z 

=  8 

+  x2 

—  x4 

-  X 5 

Xi 

=  8 

-  X2 

—  x4 

*3 

= 

x2 

-  x5 

The  objective  value  has  not  changed,  but  our  slack  form  has.  Fortunately,  if  we 
pivot  again,  with  x2  entering  and  X\  leaving,  the  objective  value  increases  (to  16), 
and  the  simplex  algorithm  can  continue. 

Degeneracy  can  prevent  the  simplex  algorithm  from  terminating,  because  it  can 
lead  to  a  phenomenon  known  as  cycling :  the  slack  forms  at  two  different  itera¬ 
tions  of  Simplex  are  identical.  Because  of  degeneracy,  Simplex  could  choose  a 
sequence  of  pivot  operations  that  leave  the  objective  value  unchanged  but  repeat 
a  slack  form  within  the  sequence.  Since  Simplex  is  a  deterministic  algorithm,  if 
it  cycles,  then  it  will  cycle  through  the  same  series  of  slack  forms  forever,  never 
terminating. 

Cycling  is  the  only  reason  that  Simplex  might  not  terminate.  To  show  this  fact, 
we  must  first  develop  some  additional  machinery. 

At  each  iteration,  Simplex  maintains  A,  b,  c,  and  v  in  addition  to  the  sets 
N  and  B.  Although  we  need  to  explicitly  maintain  A,  b,  c,  and  v  in  order  to 
implement  the  simplex  algorithm  efficiently,  we  can  get  by  without  maintaining 
them.  In  other  words,  the  sets  of  basic  and  nonbasic  variables  suffice  to  uniquely 
determine  the  slack  form.  Before  proving  this  fact,  we  prove  a  useful  algebraic 
lemma. 
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Lemma  29.3 

Let  I  be  a  set  of  indices.  For  each  j  €  I ,  let  ay  and  fj  be  real  numbers,  and  let  Xj 
be  a  real-valued  variable.  Let  y  be  any  real  number.  Suppose  that  for  any  settings 
of  the  Xj ,  we  have 


(29.78) 


Then  otj  =  fij  for  each  j  e  I ,  and  y  =  0. 

Proof  Since  equation  (29.78)  holds  for  any  values  of  the  Xj,  we  can  use  particular 
values  to  draw  conclusions  about  a,  /3,  and  y.  If  we  let  Xj  =  0  for  each  j  e  I , 
we  conclude  that  y  =  0.  Now  pick  an  arbitrary  index  j  e  /,  and  set  Xj  =  1  and 
Xk  =  0  for  all  k  f  j .  Then  we  must  have  ay  =  /3; .  Since  we  picked  j  as  any 
index  in  7 ,  we  conclude  that  ay  =  ff  for  each  j  €  I .  m 

A  particular  linear  program  has  many  different  slack  forms;  recall  that  each  slack 
form  has  the  same  set  of  feasible  and  optimal  solutions  as  the  original  linear  pro¬ 
gram.  We  now  show  that  the  slack  form  of  a  linear  program  is  uniquely  determined 
by  the  set  of  basic  variables.  That  is,  given  the  set  of  basic  variables,  a  unique  slack 
form  (unique  set  of  coefficients  and  right-hand  sides)  is  associated  with  those  basic 
variables. 

Lemma  29.4 

Let  ( A ,  b ,  c)  be  a  linear  program  in  standard  form.  Given  a  set  B  of  basic  variables, 
the  associated  slack  form  is  uniquely  determined. 

Proof  Assume  for  the  purpose  of  contradiction  that  there  are  two  different  slack 
forms  with  the  same  set  B  of  basic  variables.  The  slack  forms  must  also  have 
identical  sets  N  =  {1, 2, . . . ,  n  +  m}  —  B  of  nonbasic  variables.  We  write  the  first 
slack  form  as 


(29.79) 


jeN 


(29.80) 


jeN 


and  the  second  as 


(29.81) 


jeN 


(29.82) 


jeN 
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Consider  the  system  of  equations  formed  by  subtracting  each  equation  in 
line  (29.82)  from  the  corresponding  equation  in  line  (29.80).  The  resulting  sys¬ 
tem  is 

0  =  ( bi  —  b'j)  —  y^(q,7-  —  a\j)Xj  for  i  e  B 

jeN 

or,  equivalently, 

J^auXj  =  (bi  —b'i)+  ’Y2,a'ijxj  f°r  i  ^  B  ■ 

jeN  jeN 

Now,  for  each  i  e  B,  apply  Lemma  29.3  with  Uj  =  a,n  ff  =  a'-  -,  y  =  b,  —b\,  and 
/  =  N.  Since  a ,  =  /?;,  we  have  that  a,y  =  a 2  for  each  jeN,  and  since  y  =  0, 
we  have  that  bi  =  b\ .  Thus,  for  the  two  slack  forms,  A  and  b  are  identical  to  A' 
and  b' .  Using  a  similar  argument,  Exercise  29.3-1  shows  that  it  must  also  be  the 
case  that  c  —  c'  and  v  =  v' ,  and  hence  that  the  slack  forms  must  be  identical.  ■ 

We  can  now  show  that  cycling  is  the  only  possible  reason  that  Simplex  might 
not  terminate. 

Lemma  29.5 

If  Simplex  fails  to  terminate  in  at  most  ("+"1)  iterations,  then  it  cycles. 

Proof  By  Lemma  29.4,  the  set  B  of  basic  variables  uniquely  determines  a  slack 
form.  There  are  n  +  m  variables  and  |  B\  =  m,  and  therefore,  there  are  at  most 
Cm™)  ways  t0  choose  B.  Thus,  there  are  only  at  most  ("+m)  unique  slack  forms. 
Therefore,  if  Simplex  runs  for  more  than  ("l"1)  iterations,  it  must  cycle.  ■ 

Cycling  is  theoretically  possible,  but  extremely  rare.  We  can  prevent  it  by  choos¬ 
ing  the  entering  and  leaving  variables  somewhat  more  carefully.  One  option  is  to 
perturb  the  input  slightly  so  that  it  is  impossible  to  have  two  solutions  with  the 
same  objective  value.  Another  option  is  to  break  ties  by  always  choosing  the  vari¬ 
able  with  the  smallest  index,  a  strategy  known  as  Bland’s  rule.  We  omit  the  proof 
that  these  strategies  avoid  cycling. 

Lemma  29.6 

If  lines  4  and  9  of  Simplex  always  break  ties  by  choosing  the  variable  with  the 
smallest  index,  then  Simplex  must  terminate.  ■ 


We  conclude  this  section  with  the  following  lemma. 
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Lemma  29.7 

Assuming  that  Initialize-Simplex  returns  a  slack  form  for  which  the  basic  so¬ 
lution  is  feasible,  Simplex  either  reports  that  a  1  i near  program  is  unbounded,  or  it 
terminates  with  a  feasible  solution  in  at  most  (n+m\  iterations. 

Proof  Lemmas  29.2  and  29.6  show  that  if  Initialize-Simplex  returns  a  slack 
form  for  which  the  basic  solution  is  feasible,  Simplex  either  reports  that  a  linear 
program  is  unbounded,  or  it  terminates  with  a  feasible  solution.  By  the  contra¬ 
positive  of  Lemma  29.5,  if  Simplex  terminates  with  a  feasible  solution,  then  it 
terminates  in  at  most  iterations.  ■ 

Exercises 


29.3-1 

Complete  the  proof  of  Lemma  29.4  by  showing  that  it  must  be  the  case  that  c  =  c' 
and  v  =  v' . 


29.3-2 

Show  that  the  call  to  Pivot  in  line  12  of  Simplex  never  decreases  the  value  of  v. 


29.3-3 

Prove  that  the  slack  form  given  to  the  PIVOT  procedure  and  the  slack  form  that  the 
procedure  returns  are  equivalent. 


29.3-4 

Suppose  we  convert  a  linear  program  ( A,b,c )  in  standard  form  to  slack  form. 
Show  that  the  basic  solution  is  feasible  if  and  only  i f  b,  >  0  for  i  =  1,2,...,  in. 


29.3-5 

Solve  the  following  linear  program  using  Simplex: 

maximize  18x[  +  12.5x2 

subject  to 


Xi  + 

x2 

< 

20 

Xi 

< 

12 

x2 

< 

16 

X  i ,  x2 

> 

0 

29.4  Duality 
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29.3-6 

Solve  the  following  linear  program  using  Simplex: 
maximize  5xi  —  3x2 

subject  to 

X\  —  x2  <  1 

2.x  \  T  x2  ^  2 

X\ ,  x2  >  0  . 


29.3-7 

Solve  the  following  linear  program  using  Simplex: 


minimize  X\ 
subject  to 

2xi 

20a'! 


+  x2  + 

+  7.5x2  + 

+  5x2  + 

Xi,  X2,  X3 


X3 


3x3 

> 

10000 

10x3 

> 

30000 

> 

0 

29.3-8 

In  the  proof  of  Lemma  29.5,  we  argued  that  there  are  at  most  (m;J  ")  ways  to  choose 
a  set  B  of  basic  variables.  Give  an  example  of  a  linear  program  in  which  there  are 
strictly  fewer  than  ("'+")  ways  to  choose  the  set  B. 


29.4  Duality 

We  have  proven  that,  under  certain  assumptions,  Simplex  terminates.  We  have  not 
yet  shown  that  it  actually  finds  an  optimal  solution  to  a  1  i near  program,  however. 
In  order  to  do  so,  we  introduce  a  powerful  concept  called  linear-programming 
duality. 

Duality  enables  us  to  prove  that  a  solution  is  indeed  optimal.  We  saw  an  exam¬ 
ple  of  duality  in  Chapter  26  with  Theorem  26.6,  the  max-flow  min-cut  theorem. 
Suppose  that,  given  an  instance  of  a  maximum-flow  problem,  we  find  a  flow  / 
with  value  |/|.  How  do  we  know  whether  /  is  a  maximum  flow?  By  the  max-flow 
min-cut  theorem,  if  we  can  find  a  cut  whose  value  is  also  |  / 1 ,  then  we  have  ver¬ 
ified  that  /  is  indeed  a  maximum  flow.  This  relationship  provides  an  example  of 
duality:  given  a  maximization  problem,  we  define  a  related  minimization  problem 
such  that  the  two  problems  have  the  same  optimal  objective  values. 

Given  a  linear  program  in  which  the  objective  is  to  maximize,  we  shall  describe 
how  to  formulate  a  dual  linear  program  in  which  the  objective  is  to  minimize  and 
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whose  optimal  value  is  identical  to  that  of  the  original  1  i  near  program.  When  refer¬ 
ring  to  dual  linear  programs,  we  call  the  original  1  i near  program  the  primal. 

Given  a  primal  linear  program  in  standard  form,  as  in  (29 . 1 6)— (29 .18),  we  define 
the  dual  1  i  near  program  as 

m 

minimize  (29.83) 

7  =  1 

subject  to 

m 

J2auyi  —  CJ  for  j  =  1,2, . . .  ,n  ,  (29.84) 

i=i 

yt  >  0  for  i  =  1, 2, . . . ,  m  .  (29.85) 

To  form  the  dual,  we  change  the  maximization  to  a  minimization,  exchange  the 
roles  of  coefficients  on  the  right-hand  sides  and  the  objective  function,  and  replace 
each  less-than-or-equal-to  by  a  greater-than-or-equal-to.  Each  of  the  m  constraints 
in  the  primal  has  an  associated  variable  y{  in  the  dual,  and  each  of  the  n  constraints 
in  the  dual  has  an  associated  variable  Xj  in  the  primal.  For  example,  consider  the 
linear  program  given  in  (29.53)— (29.57).  The  dual  of  this  linear  program  is 


minimize  30yi  +  24y2  +  36y3  (29.86) 

subject  to 

y  i  +  2y2  +  4y3  >  3  (29.87) 

>’i  +  2  y2  +  y3  2  1  (29.88) 

3>q  +  5y2  +  2_v3  2  2  (29.89) 

v i )  y 2?  y 3  >  o  .  (29.90) 


We  shall  show  in  Theorem  29.10  that  the  optimal  value  of  the  dual  linear  pro¬ 
gram  is  always  equal  to  the  optimal  value  of  the  primal  linear  program.  Further¬ 
more,  the  simplex  algorithm  actually  implicitly  solves  both  the  primal  and  the  dual 
lineal-  programs  simultaneously,  thereby  providing  a  proof  of  optimality. 

We  begin  by  demonstrating  weak  duality ,  which  states  that  any  feasible  solu¬ 
tion  to  the  primal  linear  program  has  a  value  no  greater  than  that  of  any  feasible 
solution  to  the  dual  linear  program. 

Lemma  29.8  (Weak  linear-programming  duality ) 

Fet  x  be  any  feasible  solution  to  the  primal  linear  program  in  (29. 16)— (29. 18)  and 
let  y  be  any  feasible  solution  to  the  dual  linear  program  in  (29.83)-(29.85).  Then, 
we  have 

n  m 

Y  CjXj  <  Y  h‘yi  ■ 

j=  1  1  =  1 
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Proof  We  have 


n 


n 


m 


7  =  1 


<  EE  aij  yt  1  xj  (by  inequalities  (29.84)) 

7  =  1  V  =  1  / 


m 


(by  inequalities  (29.17))  . 


Corollary  29.9 

Let  x  be  a  feasible  solution  to  a  primal  linear  program  (A,b.c),  and  let  y  be  a 
feasible  solution  to  the  corresponding  dual  linear  program.  If 


n 


m 


=  J2biy>  > 


7  =  1 


then  x  and  y  are  optimal  solutions  to  the  primal  and  dual  linear  programs,  respec¬ 
tively. 

Proof  By  Lemma  29.8,  the  objective  value  of  a  feasible  solution  to  the  primal 
cannot  exceed  that  of  a  feasible  solution  to  the  dual.  The  primal  linear  program  is 
a  maximization  problem  and  the  dual  is  a  minimization  problem.  Thus,  if  feasible 
solutions  x  and  y  have  the  same  objective  value,  neither  can  be  improved.  ■ 

Before  proving  that  there  always  is  a  dual  solution  whose  value  is  equal  to  that 
of  an  optimal  primal  solution,  we  describe  how  to  find  such  a  solution.  When 
we  ran  the  simplex  algorithm  on  the  linear  program  in  (29.53)— (29.57),  the  final 
iteration  yielded  the  slack  form  (29.72)-(29.75)  with  objective  z  =  28  —  x3/6  — 
x5/6  — 2x6/3,  B  =  {1,2,  4},  and  N  =  {3,  5,  6}.  As  we  shall  show  below,  the  basic 
solution  associated  with  the  final  slack  form  is  indeed  an  optimal  solution  to  the 
linear  program;  an  optimal  solution  to  1  i near  program  (29.53)— (29.57)  is  therefore 
(xi,x2,x3)  =  (8,4,0),  with  objective  value  (3  ■  8)  +  (1  •  4)  +  (2  ■  0)  =  28.  As 
we  also  show  below,  we  can  read  off  an  optimal  dual  solution:  the  negatives  of  the 
coefficients  of  the  primal  objective  function  are  the  values  of  the  dual  variables. 
More  precisely,  suppose  that  the  last  slack  form  of  the  primal  is 


z 


jeN 


for  i  €  B  . 


jeN 
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Then,  to  produce  an  optimal  dual  solution,  we  set 


yi  = 


-c'n+i  if  (n  +  0  e  N  , 
0  otherwise  . 


(29.91) 


Thus,  an  optimal  solution  to  the  dual  linear  program  defined  in  (29.86)-(29.90) 
is  ji  =  0  (since  77  +  1  =  4  e  B),  y2  =  —  c’s  =  1/6,  and  y3  =  —  c'6  =  2/3. 
Evaluating  the  dual  objective  function  (29.86),  we  obtain  an  objective  value  of 
(30  ■  0)  +  (24  ■  (1/6))  +  (36  ■  (2/3))  =  28,  which  confirms  that  the  objective  value 
of  the  primal  is  indeed  equal  to  the  objective  value  of  the  dual.  Combining  these 
calculations  with  Lemma  29.8  yields  a  proof  that  the  optimal  objective  value  of  the 
primal  linear  program  is  28.  We  now  show  that  this  approach  applies  in  general: 
we  can  find  an  optimal  solution  to  the  dual  and  simultaneously  prove  that  a  solution 
to  the  primal  is  optimal. 


Theorem  29.10  ( Linear-programming  duality ) 

Suppose  that  Simplex  returns  values  x  =  (x,\.x2 . xn)  for  the  primal  lin¬ 

eal-  program  ( A.b.c ).  Let  N  and  B  denote  the  nonbasic  and  basic  variables  for 
the  final  slack  form,  let  c'  denote  the  coefficients  in  the  final  slack  form,  and  let 
y  =  (ji,  y2, . . . ,  ym)  be  defined  by  equation  (29.91).  Then  x  is  an  optimal  so¬ 
lution  to  the  primal  linear  program,  y  is  an  optimal  solution  to  the  dual  linear 
program,  and 

n  m 

^CjXj  =  ^biyi  .  (29.92) 

.7  =  1  <  =  1 


Proof  By  Corollary  29.9,  if  we  can  find  feasible  solutions  x  and  y  that  satisfy 
equation  (29.92),  then  x  and  y  must  be  optimal  primal  and  dual  solutions.  We 
shall  now  show  that  the  solutions  x  and  y  described  in  the  statement  of  the  theorem 
satisfy  equation  (29.92). 

Suppose  that  we  run  Simplex  on  a  primal  linear  program,  as  given  in  lines 
(29.16)-(29.18).  The  algorithm  proceeds  through  a  series  of  slack  forms  until  it 
terminates  with  a  final  slack  form  with  objective  function 

z  =  v'  +  J2  CjXj  ■  (29.93) 

j€N 

Since  Simplex  terminated  with  a  solution,  by  the  condition  in  line  3  we  know  that 
c'j  <  0  for  all  j  e  N  .  (29.94) 
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If  we  define 


c'j  =  0  for  all  jeB,  (29.95) 

we  can  rewrite  equation  (29.93)  as 

z  =  v'  +  J2cjxj 

jeN 

=  v'  +  ^  c'jXj  +  c'jXj  (because  c'j  =  0  if  j  e  B) 

jeN  jeB 

n+m 

=  v'  +  c’jXj  (because  NUB  =  { 1, 2, . . . ,  n  +  m}) .  (29.96) 

7  =  1 

For  the  basic  solution  x  associated  with  this  final  slack  form,  Xj  =  0  for  all  jeN, 
and  <,  =  v' .  Since  all  slack  forms  are  equivalent,  if  we  evaluate  the  original  objec¬ 
tive  function  on  x,  we  must  obtain  the  same  objective  value: 


n 

7  =  1 


n+m 

v'  +  E  c'+i 

7  =  1 

v'  +  Ec7^-  +  E  c'+j 

jeN  jeB 

v'  +  e^c7'  ■  E<°  ’ 

jeN  jeB 


(29.97) 


(29.98) 


We  shall  now  show  that  y,  defined  by  equation  (29.91),  is  feasible  for  the  dual 
linear  program  and  that  its  objective  value  Y1T=i  7/  equals  Yl'j= i  CjXj.  Equa¬ 
tion  (29.97)  says  that  the  first  and  last  slack  forms,  evaluated  at  x,  are  equal.  More 
generally,  the  equivalence  of  all  slack  forms  implies  that  for  any  set  of  values 
x  =  (x\,x2, . . .  ,xn),  we  have 

n  n+m 

E  cjxj  = v'  +  E  cjxj  ■ 

7=1  7=1 

Therefore,  for  any  particular-  set  of  values  x  =  (x\ ,  x2, . . . ,  xn),  we  have 
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n+m 


,jXj 


j= 1 

=  “'  +  E'r 

7  =  1 

r' +ibcjxj  +  e  cp 

7  =  1  7=/i+l 

n  m 

v'  +  EC./*7  +  J2Cn+iXn+i 
7=1  1=1 


n+m 


=  V 


<v*y 


=  V 


=  V 


'  +  ^  c'jXj  +  Y^  (— y,)  x„+,-  (by  equations  (29.91)  and  (29.95)) 

7=1  1=1 

«  m  /  n  \ 

'  +  Hc'jXj  +  J2(-yi)  (bi  -E  a ij Xj  I  (by  equation  (29.32)) 


7=1  1  =  1 


7  =  1 


m  n 


=  v'  +  Y  c'jXj  -  y  b<  v  +  E  E  (a{j  )  * 

7=1  1=1  1=1 7=1 


v'  +  Y  CjXj  -  Y  btyt  +  EE  fajyd 

7  =  1  1  =  1  7  =  1  1  =  1 


Xi 


=  V 


'-!>*  +  E(c7-  +  Efly* 


i  I  7 


X;  , 


1  =  1 


7=1  \  i=l 


so  that 


Ec^  =  ( n'  -E^ij  +  E  ycj  +  E"'/eJ  *7 


(29.99) 


7  =  1 


1  =  1 


Applying  Lemma  29.3  to  equation  (29.99),  we  obtain 

m 

v'-Y^  =  °- 
1  =  1 
m 

Cj  +  YUijVi  =  C j  l0r  J  =  1’2 . H  ■ 


(29.100) 

(29.101) 


i=i 


By  equation  (29.100),  we  have  that  Y1T=i  b‘  V‘  =  v'’  an<^  hence  the  objective  value 
of  the  dual  (Y+Li  is  equal  to  that  of  the  primal  (i/).  It  remains  to  show 
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that  the  solution  y  is  feasible  for  the  dual  problem.  From  inequalities  (29.94)  and 
equations  (29.95),  we  have  that  c '■  <  0  for  all  j  =  1,2,...,//  +  m.  Hence,  for  any 
j  =  1,2 equations  (29.101)  imply  that 

m 

cj  +  av y 1 

i  =  1 
m 

’ 

i  =  1 

which  satisfies  the  constraints  (29.84)  of  the  dual.  Finally,  since  c'-  <  0  for  each 
j  €  NUB,  when  we  set  y  according  to  equation  (29.91),  we  have  that  each  Pi  >  0, 
and  so  the  nonnegativity  constraints  are  satisfied  as  well.  ■ 

We  have  shown  that,  given  a  feasible  linear  program,  if  Initialize-Simplex 
returns  a  feasible  solution,  and  if  Simplex  terminates  without  returning  “un¬ 
bounded,”  then  the  solution  returned  is  indeed  an  optimal  solution.  We  have  also 
shown  how  to  construct  an  optimal  solution  to  the  dual  linear  program. 

Exercises 


29.4-1 

Formulate  the  dual  of  the  linear  program  given  in  Exercise  29.3-5. 


29.4-2 

Suppose  that  we  have  a  linear  program  that  is  not  in  standard  form.  We  could 
produce  the  dual  by  first  converting  it  to  standard  form,  and  then  taking  the  dual. 
It  would  be  more  convenient,  however,  to  be  able  to  produce  the  dual  directly. 
Explain  how  we  can  directly  take  the  dual  of  an  arbitrary  linear  program. 


29.4-3 

Write  down  the  dual  of  the  maximum-flow  1  i near  program,  as  given  in  lines 
(29.47)-(29.50)  on  page  860.  Explain  how  to  interpret  this  formulation  as  a 
minimum-cut  problem. 


29.4-4 

Write  down  the  dual  of  the  minimum-cost-flow  linear  program,  as  given  in  lines 
(29.51)-(29.52)  on  page  862.  Explain  how  to  interpret  this  problem  in  terms  of 
graphs  and  flows. 


29.4-5 

Show  that  the  dual  of  the  dual  of  a  linear  program  is  the  primal  linear  program. 
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29.4-6 

Which  result  from  Chapter  26  can  be  interpreted  as  weak  duality  for  the  maximum- 
flow  problem? 


29.5  The  initial  basic  feasible  solution 


In  this  section,  we  first  describe  how  to  test  whether  a  linear  program  is  feasible, 
and  if  it  is,  how  to  produce  a  slack  form  for  which  the  basic  solution  is  feasible. 
We  conclude  by  proving  the  fundamental  theorem  of  1  i  near  programming,  which 
says  that  the  Simplex  procedure  always  produces  the  correct  result. 


Finding  an  initial  solution 


In  Section  29.3,  we  assumed  that  we  had  a  procedure  Initialize-Simplex  that 
determines  whether  a  linear  program  has  any  feasible  solutions,  and  if  it  does,  gives 
a  slack  form  for  which  the  basic  solution  is  feasible.  We  describe  this  procedure 
here. 

A  linear  program  can  be  feasible,  yet  the  initial  basic  solution  might  not  be 
feasible.  Consider,  for  example,  the  following  linear  program: 


maximize 
subject  to 


2x,  — 

x2 

(29.102) 

2x\  — 

x2 

< 

2 

(29.103) 

Xl  - 

5x2 

< 

-4 

(29.104) 

X]_,x2 

> 

0  . 

(29.105) 

If  we  were  to  convert  this  linear  program  to  slack  form,  the  basic  solution  would 
set  xi  =  0  and  x2  =  0.  This  solution  violates  constraint  (29.104),  and  so  it  is  not  a 
feasible  solution.  Thus,  Initialize-Simplex  cannot  just  return  the  obvious  slack 
form.  In  order  to  determine  whether  a  linear  program  has  any  feasible  solutions, 
we  will  formulate  an  auxiliary  linear  program.  For  this  auxiliary  linear  program, 
we  can  find  (with  a  little  work)  a  slack  form  for  which  the  basic  solution  is  feasible. 
Furthermore,  the  solution  of  this  auxiliary  linear  program  determines  whether  the 
initial  linear  program  is  feasible  and  if  so,  it  provides  a  feasible  solution  with  which 
we  can  initialize  Simplex. 


Lemma  29.11 

Let  L  be  a  linear  program  in  standard  form,  given  as  in  (29.16)— (29. 18).  Let  x0  be 
a  new  variable,  and  let  L.dm  be  the  following  linear  program  with  n  +  1  variables: 
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maximize  — x0 

(29.106) 

subject  to 

’Y^cijjXj-Xo  <  bj 

for  i 

=  1,2,.. 

. ,  m  , 

(29.107) 

7  =  1 

IV 

o 

for  j 

=  0,1,.. 

. ,  n  . 

(29.108) 

Then  L  is  feasible  if  and  only  if  the  optimal  objective  value  of  Laux  is  0. 

Proof  Suppose  that  L  has  a  feasible  solution  x  =  (xi ,  x2, . . . ,  x„).  Then  the 
solution  x0  =  0  combined  with  x  is  a  feasible  solution  to  Laux  with  objective 
value  0.  Since  x0  >  0  is  a  constraint  of  Laux  and  the  objective  function  is  to 
maximize  — Xo,  this  solution  must  be  optimal  for  Laux. 

Conversely,  suppose  that  the  optimal  objective  value  of  Laux  is  0.  Then  x0  =  0, 
and  the  remaining  solution  values  of  x  satisfy  the  constraints  of  L.  m 

We  now  describe  our  strategy  to  find  an  initial  basic  feasible  solution  for  a  linear 
program  L  in  standard  form: 

Initialize-Simplex  (A,  b,  c ) 

1  let  k  be  the  index  of  the  minimum  b, 

2  if  bk  >  0  //  is  the  initial  basic  solution  feasible? 

3  return  ({1, 2 . n) ,{n  +  1,  n  +  2 . n  +  m}  ,  A,  b,  c,  0) 

4  form  Laux  by  adding  — x0  to  the  left-hand  side  of  each  constraint 

and  setting  the  objective  function  to  — x0 

5  let  (N,  B,  A.  b.  c,  v )  be  the  resulting  slack  form  for  Laux 

6  l  =  n  +  k 

7  //  Laux  has  n  +  1  nonbasic  variables  and  m  basic  variables. 

8  (N,  B ,  A,  b ,  c,  v)  =  Pivot (fV,  B ,  A,  b,  c,  v,  l,  0) 

9  //  The  basic  solution  is  now  feasible  for  Laux. 

10  iterate  the  while  loop  of  lines  3-12  of  Simplex  until  an  optimal  solution 

to  Laux  is  found 

11  if  the  optimal  solution  to  Laux  sets  x0  to  0 

12  if  x0  is  basic 

13  perform  one  (degenerate)  pivot  to  make  it  nonbasic 

14  from  the  final  slack  form  of  Laux,  remove  x0  from  the  constraints  and 

restore  the  original  objective  function  of  L ,  but  replace  each  basic 
variable  in  this  objective  function  by  the  right-hand  side  of  its 
associated  constraint 

15  return  the  modified  final  slack  form 


16  else  return  “infeasible” 
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Initialize -Simplex  works  as  follows.  In  lines  1-3,  we  implicitly  test  the 
basic  solution  to  the  initial  slack  form  for  L  given  by  N  =  {1,2 , ,n},  B  = 
{n  +  1  ,n  +  2, .  +  m},  x)  =  bt  for  all  i  €  B,  and  Xj  =  0  for  all  j  €  N . 
(Creating  the  slack  form  requires  no  explicit  effort,  as  the  values  of  A,  b,  and  c  are 
the  same  in  both  slack  and  standard  forms.)  If  line  2  finds  this  basic  solution  to  be 
feasible— that  is,  x,  >  0  for  all  i  €  N  U  B— then  line  3  returns  the  slack  form. 
Otherwise,  in  line  4,  we  form  the  auxiliary  linear  program  Laux  as  in  Lemma  29. 1 1 . 
Since  the  initial  basic  solution  to  L  is  not  feasible,  the  initial  basic  solution  to  the 
slack  form  for  Laux  cannot  be  feasible  either.  To  find  a  basic  feasible  solution,  we 
perform  a  single  pivot  operation.  Line  6  selects  /  =  n  +  k  as  the  index  of  the 
basic  variable  that  will  be  the  leaving  variable  in  the  upcoming  pivot  operation. 

Since  the  basic  variables  are  xn+i,  xn+2 . xn+m,  the  leaving  variable  x/  will  be 

the  one  with  the  most  negative  value.  Line  8  performs  that  call  of  PIVOT,  with 
Xo  entering  and  X;  leaving.  We  shall  see  shortly  that  the  basic  solution  resulting 
from  this  call  of  PIVOT  will  be  feasible.  Now  that  we  have  a  slack  form  for  which 
the  basic  solution  is  feasible,  we  can,  in  line  10,  repeatedly  call  PIVOT  to  fully 
solve  the  auxiliary  1  i near  program.  As  the  test  in  line  1 1  demonstrates,  if  we  find 
an  optimal  solution  to  Laux  with  objective  value  0,  then  in  lines  12-14,  we  create 
a  slack  form  for  L  for  which  the  basic  solution  is  feasible.  To  do  so,  we  first, 
in  lines  12-13,  handle  the  degenerate  case  in  which  Xo  may  still  be  basic  with 
value  x0  =  0.  In  this  case,  we  perform  a  pivot  step  to  remove  x0  from  the  basis, 
using  any  e  e  N  such  that  a0e  ^  0  as  the  entering  variable.  The  new  basic 
solution  remains  feasible;  the  degenerate  pivot  does  not  change  the  value  of  any 
variable.  Next  we  delete  all  x0  terms  from  the  constraints  and  restore  the  original 
objective  function  for  L.  The  original  objective  function  may  contain  both  basic 
and  nonbasic  variables.  Therefore,  in  the  objective  function  we  replace  each  basic 
variable  by  the  right-hand  side  of  its  associated  constraint.  Line  15  then  returns 
this  modified  slack  form.  If,  on  the  other  hand,  line  1 1  discovers  that  the  original 
linear  program  L  is  infeasible,  then  line  16  returns  this  information. 

We  now  demonstrate  the  operation  of  Initialize-Simplex  on  the  linear  pro¬ 
gram  (29. 102)— (29. 105).  This  linear  program  is  feasible  if  we  can  find  nonneg¬ 
ative  values  for  X\  and  x2  that  satisfy  inequalities  (29.103)  and  (29.104).  Using 
Lemma  29. 1 1 ,  we  formulate  the  auxiliary  linear  program 


maximize 
subject  to 


-Xo 

(29.109) 

-  *2 

Xo 

< 

2 

(29.110) 

—  5x2 

Xo 

< 

-4 

(29.111) 

Xx,X2,Xo 

> 

0  . 

By  Lemma  29.11,  if  the  optimal  objective  value  of  this  auxiliary  linear  program 
is  0,  then  the  original  1  inear  program  has  a  feasible  solution.  If  the  optimal  objective 
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value  of  this  auxiliary  linear  program  is  negative,  then  the  original  linear  program 
does  not  have  a  feasible  solution. 

We  write  this  linear  program  in  slack  form,  obtaining 

2  =  -  X0 

X3  2  —  2xi  +  X2  "P  Xo 

X4  —  — 4  —  X\  +  5x2  +  x0  ■ 

We  are  not  out  of  the  woods  yet,  because  the  basic  solution,  which  would  set 
x4  =  —4,  is  not  feasible  for  this  auxiliary  linear  program.  We  can,  however,  with 
one  call  to  PIVOT,  convert  this  slack  form  into  one  in  which  the  basic  solution  is 
feasible.  As  line  8  indicates,  we  choose  x0  to  be  the  entering  variable.  In  line  6,  we 
choose  as  the  leaving  variable  x4,  which  is  the  basic  variable  whose  value  in  the 
basic  solution  is  most  negative.  After  pivoting,  we  have  the  slack  form 

Z  =  —4  —  X\  +  5x2  —  x4 

x0  =  4  +  Xi  —  5x2  +  x4 

x3  =  6  —  Xi  —  4x2  +  x4  . 

The  associated  basic  solution  is  (x0,  Xi,  x2,  x3,  x4)  =  (4, 0, 0,  6, 0),  which  is  feasi¬ 
ble.  We  now  repeatedly  call  PIVOT  until  we  obtain  an  optimal  solution  to  Laux.  In 
this  case,  one  call  to  PIVOT  with  x2  entering  and  x0  leaving  yields 


Z  =  -  X0 


x2  = 

4 

x0 

+  T 

+ 

x4 

*3  = 

5 

14 

■'t  1 

+ 

5 

9xi 

+ 

5 

x4 

5 

5 

5 

5 

This  slack  form  is  the  final  solution  to  the  auxiliary  problem.  Since  this  solution 
has  x0  =  0,  we  know  that  our  initial  problem  was  feasible.  Furthermore,  since 
x0  =  0,  we  can  just  remove  it  from  the  set  of  constraints.  We  then  restore  the 
original  objective  function,  with  appropriate  substitutions  made  to  include  only 
nonbasic  variables.  In  our  example,  we  get  the  objective  function 


2x\  —  x2  =  2xi  — 


■*0  *1  X4 \ 

5  +  5  +  5  ) 


Setting  x0  =  0  and  simplifying,  we  get  the  objective  function 

4  9xi  x4 
_5  +  ~1T  ~  ~5  ’ 


and  the  slack  form 
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9Xi  X4 

~T  ~  ~5 

Xi  X4 

—  +  — 

5  5 

9xi  x4 

~  +  ~5  ' 

This  slack  form  has  a  feasible  basic  solution,  and  we  can  return  it  to  procedure 
Simplex. 

We  now  formally  show  the  correctness  of  Initialize-Simplex. 

Lemma  29.12 

If  a  linear  program  L  has  no  feasible  solution,  then  Initialize-Simplex  returns 
“infeasible.”  Otherwise,  it  returns  a  valid  slack  form  for  which  the  basic  solution 
is  feasible. 

Proof  First  suppose  that  the  linear  program  L  has  no  feasible  solution.  Then  by 
Lemma  29.11,  the  optimal  objective  value  of  Laux,  defined  in  (29.106)-(29.108), 
is  nonzero,  and  by  the  nonnegativity  constraint  on  x0,  the  optimal  objective  value 
must  be  negative.  Furthermore,  this  objective  value  must  be  finite,  since  setting 
X;  =  0,  for  i  =  1,2,...,  n,  and  x0  =  |min'lj_1  { /),■ } |  is  feasible,  and  this  solution 
has  objective  value  —  |min'"=1  { b, } | .  Therefore,  line  10  of  Initialize-Simplex 
finds  a  solution  with  a  nonpositive  objective  value.  Let  x  be  the  basic  solution 
associated  with  the  final  slack  form.  We  cannot  have  x0  =  0,  because  then  Laux 
would  have  objective  value  0,  which  contradicts  that  the  objective  value  is  negative. 
Thus  the  test  in  line  11  results  in  line  16  returning  “infeasible.” 

Suppose  now  that  the  linear  program  L  does  have  a  feasible  solution.  From 
Exercise  29.3-4,  we  know  that  if  b,  >  0  for  i  =  1 , 2, ....  m,  then  the  basic  solution 
associated  with  the  initial  slack  form  is  feasible.  In  this  case,  lines  2-3  return  the 
slack  form  associated  with  the  input.  (Converting  the  standard  form  to  slack  form 
is  easy,  since  A,  b,  and  c  are  the  same  in  both.) 

In  the  remainder  of  the  proof,  we  handle  the  case  in  which  the  1  inear  program  is 
feasible  but  we  do  not  return  in  line  3.  We  argue  that  in  this  case,  lines  4-10  find  a 
feasible  solution  to  Laux  with  objective  value  0.  First,  by  lines  1-2,  we  must  have 

bk  <  0  , 

and 

bk  —  bi  for  each  i  €  B  .  (29.112) 

In  line  8,  we  perform  one  pivot  operation  in  which  the  leaving  variable  X;  (recall 
that  I  =  n  +  k,  so  that  bi  <  0)  is  the  left-hand  side  of  the  equation  with  mini¬ 
mum  bj ,  and  the  entering  variable  is  x0,  the  extra  added  variable.  We  now  show 
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that  after  this  pivot,  all  entries  of  b  are  nonnegative,  and  hence  the  basic  solution 
to  Laux  is  feasible.  Letting  x  be  the  basic  solution  after  the  call  to  PIVOT,  and 
letting  b  and  B  be  values  returned  by  PIVOT,  Lemma  29. 1  implies  that 


Xi 


bj  —  aiebe  if  /  e  B  —  {e} , 
bi/aie  if  i  =  e  . 


(29.113) 


The  call  to  PIVOT  in  line  8  has  e  —  0.  If  we  rewrite  inequalities  (29.107),  to 
include  coefficients  ai0. 


^  Ujj.Xj  <  bi  for  /  =  1.2 . m  ,  (29.114) 

7=0 

then 


Uj o  =  a.ie  =  —1  for  each  /  6  B  .  (29.115) 

(Note  that  ai0  is  the  coefficient  of  x0  as  it  appears  in  inequalities  (29.114),  not 
the  negation  of  the  coefficient,  because  Laux  is  in  standard  rather  than  slack  form.) 
Since  l  €  B,  we  also  have  that  ciie  =  —1.  Thus,  bi/aie  >  0,  and  so  xe  >  0.  For 
the  remaining  basic  variables,  we  have 

Xi  =  bi  —  aiebe  (by  equation  (29.113)) 

=  bi  —  Uie(bi laie)  (by  line  3  of  Pivot) 

=  bi  —  bi  (by  equation  (29.115)  and  aie  =  —  1) 

>  0  (by  inequality  (29.112))  , 

which  implies  that  each  basic  variable  is  now  nonnegative.  Hence  the  basic  solu¬ 
tion  after  the  call  to  PIVOT  in  line  8  is  feasible.  We  next  execute  line  10,  which 
solves  Laux.  Since  we  have  assumed  that  L  has  a  feasible  solution,  Lemma  29.1 1 
implies  that  Laux  has  an  optimal  solution  with  objective  value  0.  Since  all  the  slack 
forms  are  equivalent,  the  final  basic  solution  to  Laux  must  have  x0  =  0,  and  after 
removing  x0  from  the  linear  program,  we  obtain  a  slack  form  that  is  feasible  for  L. 
Line  15  then  returns  this  slack  form.  ■ 


Fundamental  theorem  of  linear  programming 

We  conclude  this  chapter  by  showing  that  the  Simplex  procedure  works.  In  par¬ 
ticular,  any  linear  program  either  is  infeasible,  is  unbounded,  or  has  an  optimal 
solution  with  a  finite  objective  value.  In  each  case,  Simplex  acts  appropriately. 
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Theorem  29.13  (Fundamental  theorem  of  linear  programming ) 

Any  linear  program  L,  given  in  standard  form,  either 

1.  has  an  optimal  solution  with  a  finite  objective  value, 

2.  is  infeasible,  or 

3.  is  unbounded. 

If  L  is  infeasible,  Simplex  returns  “infeasible.”  If  L  is  unbounded,  Simplex 
returns  “unbounded.”  Otherwise,  Simplex  returns  an  optimal  solution  with  a  finite 
objective  value. 

Proof  By  Lemma  29.12,  if  1  i near  program  L  is  infeasible,  then  Simplex  returns 
“infeasible.”  Now  suppose  that  the  linear  program  L  is  feasible.  By  Lemma  29.12, 
Initialize -Simplex  returns  a  slack  form  for  which  the  basic  solution  is  feasible. 
By  Lemma  29.7,  therefore,  Simplex  either  returns  “unbounded”  or  terminates 
with  a  feasible  solution.  If  it  terminates  with  a  finite  solution,  then  Theorem  29.10 
tells  us  that  this  solution  is  optimal.  On  the  other  hand,  if  Simplex  returns  “un¬ 
bounded,”  Lemma  29.2  tells  us  the  linear  program  L  is  indeed  unbounded.  Since 
Simplex  always  terminates  in  one  of  these  ways,  the  proof  is  complete.  ■ 

Exercises 


29.5-1 

Give  detailed  pseudocode  to  implement  lines  5  and  14  of  Initialize-Simplex. 


29.5-2 

Show  that  when  the  main  loop  of  Simplex  is  run  by  Initialize-Simplex,  it  can 
never  return  “unbounded.” 


29.5-3 

Suppose  that  we  are  given  a  linear  program  L  in  standard  form,  and  suppose  that 
for  both  L  and  the  dual  of  L,  the  basic  solutions  associated  with  the  initial  slack 
forms  are  feasible.  Show  that  the  optimal  objective  value  of  L  is  0. 


29.5-4 

Suppose  that  we  allow  strict  inequalities  in  a  linear  program.  Show  that  in  this 
case,  the  fundamental  theorem  of  linear  programming  does  not  hold. 


29.5  The  initial  basic  feasible  solution 
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29.5- 5 

Solve  the  following  linear  program  using  Simplex: 
maximize  X\  +  3x2 

subject  to 

Xi  —  x2  <  8 

— Xi  —  x2  <  —  3 

— Xi  “I-  4x2  <:'  2 

x  i ,  x2  >  0  . 

29.5- 6 

Solve  the  following  linear  program  using  Simplex: 


maximize 

Xi 

2x2 

subject  to 

Xi 

2x2 

< 

4 

CN 

1 

- 

6x2 

< 

-12 

x2 

< 

1 

Xi,X2 

> 

0 

29.5-7 

Solve  the  following  linear  program  using  Simplex: 
maximize  X\  +  3x2 

subject  to 

—X\  +  x2  <  — 1 

— X\  —  x2  <  —3 

—Xj  +  4x2  <  2 

Xi,x2  >  0  . 


29.5- 8 

Solve  the  linear  program  given  in  (29 . 6)— (29 .10). 

29.5- 9 

Consider  the  following  1 -variable  linear  program,  which  we  call  P  : 
maximize  tx 
subject  to 

rx  <  s 
x  >  0  , 

where  r,  s,  and  t  are  arbitrary  real  numbers.  Let  D  be  the  dual  of  P . 


894 


Chapter  29  Linear  Programming 


State  for  which  values  of  r,  s,  and  t  you  can  assert  that 

1.  Both  P  and  D  have  optimal  solutions  with  finite  objective  values. 

2.  P  is  feasible,  but  D  is  infeasible. 

3.  D  is  feasible,  but  P  is  infeasible. 

4.  Neither  P  nor  D  is  feasible. 


Problems 


29-1  Linear-inequality  feasibility 

Given  a  set  of  m  linear  inequalities  on  n  variables  X\,  jc2,  .  ■  . ,  xn,  the  linear- 
inequality  feasibility  problem  asks  whether  there  is  a  setting  of  the  variables  that 
simultaneously  satisfies  each  of  the  inequalities. 

a.  Show  that  if  we  have  an  algorithm  for  1  i near  programming,  we  can  use  it  to 
solve  a  linear-inequality  feasibility  problem.  The  number  of  variables  and  con¬ 
straints  that  you  use  in  the  linear-programming  problem  should  be  polynomial 
in  n  and  m. 

b.  Show  that  if  we  have  an  algorithm  for  the  linear-inequality  feasibility  problem, 
we  can  use  it  to  solve  a  linear-programming  problem.  The  number  of  variables 
and  lineal-  inequalities  that  you  use  in  the  linear-inequality  feasibility  problem 
should  be  polynomial  in  n  and  in,  the  number  of  variables  and  constraints  in 
the  linear  program. 

29-2  Complementary  slackness 

Complementary  slackness  describes  a  relationship  between  the  values  of  primal 
variables  and  dual  constraints  and  between  the  values  of  dual  variables  and  pri¬ 
mal  constraints.  Let  x  be  a  feasible  solution  to  the  primal  linear  program  given 
in  (29. 16)— (29. 18),  and  let  y  be  a  feasible  solution  to  the  dual  linear  program  given 
in  (29.83) — (29.85).  Complementary  slackness  states  that  the  following  conditions 
are  necessary  and  sufficient  for  x  and  y  to  be  optimal: 


m 


for  j  =  1,2 


;  =  1 
and 


n 


for  i  =  1, 2, .... m  . 
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a.  Verify  that  complementary  slackness  holds  for  the  1  inear  program  in  lines 
(29.53)— (29.57). 

b.  Prove  that  complementary  slackness  holds  for  any  primal  linear  program  and 
its  corresponding  dual. 

c.  Prove  that  a  feasible  solution  x  to  a  primal  linear  program  given  in  lines 
(29. 16)— (29. 18)  is  optimal  if  and  only  if  there  exist  values  y  =  (ji,  y2,  ■  ■  ■ ,  ym ) 
such  that 

1.  y  is  a  feasible  solution  to  the  dual  linear  program  given  in  (29.83)-(29.85), 

2.  Yl7=  i  aijyi  =  cj  f°r  J  sut'h  that  Xj  >  0,  and 

3.  j-i  =  0  for  all  i  such  that  Y^j=i  ai/*j  <  b,-. 

29-3  Integer  linear  programming 

An  integer  linear-programming  problem  is  a  linear-programming  problem  with 
the  additional  constraint  that  the  variables  x  must  take  on  integral  values.  Exer¬ 
cise  34.5-3  shows  that  just  determining  whether  an  integer  linear  program  has  a 
feasible  solution  is  NP-hard,  which  means  that  there  is  no  known  polynomial-time 
algorithm  for  this  problem. 

a.  Show  that  weak  duality  (Lemma  29.8)  holds  for  an  integer  linear  program. 

b.  Show  that  duality  (Theorem  29. 10)  does  not  always  hold  for  an  integer  linear 
program. 

c.  Given  a  primal  1  i near  program  in  standard  form,  let  us  define  P  to  be  the  opti¬ 
mal  objective  value  for  the  primal  1  i near  program,  D  to  be  the  optimal  objective 
value  for  its  dual,  IP  to  be  the  optimal  objective  value  for  the  integer  version  of 
the  primal  (that  is,  the  primal  with  the  added  constraint  that  the  variables  take 
on  integer  values),  and  ID  to  be  the  optimal  objective  value  for  the  integer  ver¬ 
sion  of  the  dual.  Assuming  that  both  the  primal  integer  program  and  the  dual 
integer  program  are  feasible  and  bounded,  show  that 

IP  <  P  —  D  <  ID  . 

29-4  Farkas’s  lemma 

Let  A  be  an  m  x  n  matrix  and  c  be  an  « -vector.  Then  Farkas’s  lemma  states  that 
exactly  one  of  the  systems 
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Ax  <  0  , 

CTX  >  0 

and 

ATy  =  c  , 
y  >  0 

is  solvable,  where  x  is  an  /? -vector  and  y  is  an  m-ve ctor.  Prove  Farkas’s  lemma. 
29-5  Minimum-cost  circulation 

In  this  problem,  we  consider  a  variant  of  the  minimum-cost-flow  problem  from 
Section  29.2  in  which  we  are  not  given  a  demand,  a  source,  or  a  sink.  Instead, 
we  are  given,  as  before,  a  flow  network  and  edge  costs  a(u,v).  A  flow  is  feasible 
if  it  satisfies  the  capacity  constraint  on  every  edge  and  flow  conservation  at  every 
vertex.  The  goal  is  to  find,  among  all  feasible  flows,  the  one  of  minimum  cost.  We 
call  this  problem  the  minimum-cost-circulation  problem. 

a.  Formulate  the  minimum-cost-circulation  problem  as  a  linear  program. 

b.  Suppose  that  for  all  edges  (u,  v)  e  E,  we  have  a(u  ,v)  >  0.  Characterize  an 
optimal  solution  to  the  minimum-cost-circulation  problem. 

c.  Formulate  the  maximum-flow  problem  as  a  minimum-cost-circulation  problem 
linear  program.  That  is  given  a  maximum-flow  problem  instance  G  =  (V,  E) 
with  source  s,  sink  t  and  edge  capacities  c,  create  a  minimum-cost-circulation 
problem  by  giving  a  (possibly  different)  network  G'  =  (V .  E')  with  edge 
capacities  c'  and  edge  costs  a'  such  that  you  can  discern  a  solution  to  the 
maximum-flow  problem  from  a  solution  to  the  minimum-cost-circulation  prob¬ 
lem. 

d.  Formulate  the  single-source  shortest-path  problem  as  a  minimum-cost-circu- 
lation  problem  linear  program. 


Chapter  notes 

This  chapter  only  begins  to  study  the  wide  field  of  linear  programming.  A  num¬ 
ber  of  books  are  devoted  exclusively  to  1  i near  programming,  including  those  by 
Chvatal  [69],  Gass  [130],  Karloff  [197],  Schrijver  [303],  and  Vanderbei  [344]. 
Many  other  books  give  a  good  coverage  of  1  i near  programming,  including  those 
by  Papadimitriou  and  Steiglitz  [271]  and  Ahuja,  Magnanti,  and  Orlin  [7].  The 
coverage  in  this  chapter  draws  on  the  approach  taken  by  Chvatal. 
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The  simplex  algorithm  for  linear  programming  was  invented  by  G.  Dantzig 
in  1947.  Shortly  after,  researchers  discovered  how  to  formulate  a  number  of  prob¬ 
lems  in  a  variety  of  fields  as  linear  programs  and  solve  them  with  the  simplex 
algorithm.  As  a  result,  applications  of  1  i  near  programming  flourished,  along  with 
several  algorithms.  Variants  of  the  simplex  algorithm  remain  the  most  popular 
methods  for  solving  linear-programming  problems.  This  history  appears  in  a  num¬ 
ber  of  places,  including  the  notes  in  [69]  and  [197]. 

The  ellipsoid  algorithm  was  the  first  polynomial-time  algorithm  for  linear  pro¬ 
gramming  and  is  due  to  L.  G.  Khachian  in  1979;  it  was  based  on  earlier  work  by 
N.  Z.  Shor,  D.  B.  Judin,  and  A.  S.  Nemirovskii.  Grotschel,  Lovasz,  and  Schrijver 
[154]  describe  how  to  use  the  ellipsoid  algorithm  to  solve  a  variety  of  problems  in 
combinatorial  optimization.  To  date,  the  ellipsoid  algorithm  does  not  appear  to  be 
competitive  with  the  simplex  algorithm  in  practice. 

Karmarkar’s  paper  [198]  includes  a  description  of  the  first  interior-point  algo¬ 
rithm.  Many  subsequent  researchers  designed  interior-point  algorithms.  Good  sur¬ 
veys  appeal-  in  the  article  of  Goldfarb  and  Todd  [141]  and  the  book  by  Ye  [361]. 

Analysis  of  the  simplex  algorithm  remains  an  active  area  of  research.  V.  Klee 
and  G.  J.  Minty  constructed  an  example  on  which  the  simplex  algorithm  runs 
through  2"  —  1  iterations.  The  simplex  algorithm  usually  performs  very  well  in 
practice  and  many  researchers  have  tried  to  give  theoretical  justification  for  this 
empirical  observation.  A  line  of  research  begun  by  K.  H.  Borgwardt,  and  carried 
on  by  many  others,  shows  that  under  certain  probabilistic  assumptions  on  the  in¬ 
put,  the  simplex  algorithm  converges  in  expected  polynomial  time.  Spielman  and 
Teng  [322]  made  progress  in  this  area,  introducing  the  “smoothed  analysis  of  algo¬ 
rithms”  and  applying  it  to  the  simplex  algorithm. 

The  simplex  algorithm  is  known  to  run  efficiently  in  certain  special  cases.  Par¬ 
ticularly  noteworthy  is  the  network-simplex  algorithm,  which  is  the  simplex  al¬ 
gorithm,  specialized  to  network-flow  problems.  For  certain  network  problems, 
including  the  shortest-paths,  maximum-flow,  and  minimum-cost-flow  problems, 
variants  of  the  network-simplex  algorithm  run  in  polynomial  time.  See,  for  exam¬ 
ple,  the  article  by  Orlin  [268]  and  the  citations  therein. 


30 


Polynomials  and  the  FFT 


The  straightforward  method  of  adding  two  polynomials  of  degree  n  takes  0(«) 
time,  but  the  straightforward  method  of  multiplying  them  takes  0 (// 2 )  time.  In  this 
chapter,  we  shall  show  how  the  fast  Fourier  transform,  or  FFT,  can  reduce  the  time 
to  multiply  polynomials  to  0(«  lg  n). 

The  most  common  use  for  Fourier  transforms,  and  hence  the  FFT,  is  in  signal 
processing.  A  signal  is  given  in  the  time  domain-,  as  a  function  mapping  time  to 
amplitude.  Fourier  analysis  allows  us  to  express  the  signal  as  a  weighted  sum  of 
phase-shifted  sinusoids  of  varying  frequencies.  The  weights  and  phases  associated 
with  the  frequencies  characterize  the  signal  in  the  frequency  domain.  Among  the 
many  everyday  applications  of  FFT’s  are  compression  techniques  used  to  encode 
digital  video  and  audio  information,  including  MP3  files.  Several  fine  books  delve 
into  the  rich  area  of  signal  processing;  the  chapter  notes  reference  a  few  of  them. 

Polynomials 

A  polynomial  in  the  variable  x  over  an  algebraic  field  F  represents  a  function  A(x) 
as  a  formal  sum: 


n— 1 


7  =  0 


We  call  the  values  a0,a i, . . .  ,an- 1  the  coefficients  of  the  polynomial.  The  co¬ 
efficients  are  drawn  from  a  field  F,  typically  the  set  C  of  complex  numbers.  A 
polynomial  A(x)  has  degree  k  if  its  highest  nonzero  coefficient  is  a*;  we  write 
that  deg ree( ,4)  =  k.  Any  integer  strictly  greater  than  the  degree  of  a  polynomial 
is  a  degree-bound  of  that  polynomial.  Therefore,  the  degree  of  a  polynomial  of 
degree-bound  n  may  be  any  integer  between  0  and  n  —  1,  inclusive. 

We  can  define  a  variety  of  operations  on  polynomials.  For  polynomial  addi¬ 
tion ,  if  A(x)  and  B(x)  are  polynomials  of  degree-bound  n,  their  sum  is  a  polyno- 
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mial  C(x),  also  of  degree-bound  n,  such  that  C(x)  =  A(x)  +  B(x)  for  all  x  in  the 
underlying  field.  That  is,  if 

72  —  1 

A(x)  =  Yajx* 

7=0 

and 

72  —  1 

B(x )  =  ^ bjxj  , 

7=0 

then 

n- 1 

C(x)  =  ^c,x7  , 

7=0 

where  Cj  =  cij  +  bj  for  j  =  0,  1 . n  —  1.  For  example,  if  we  have  the 

polynomials  A(x)  =  6x3  +  lx2  —  lOx  +  9  and  B(x)  =  — 2x3  +  4x  —  5,  then 
C(x)  =  4x3  +  lx2  -  6x  +  4. 

For  polynomial  multiplication ,  if  ,4(x)  and  B(x)  are  polynomials  of  degree- 
bound  77,  their  product  C(x)  is  a  polynomial  of  degree-bound  2/7  —  1  such  that 
C(x)  =  4(x)/?(x)  for  all  x  in  the  underlying  field.  You  probably  have  multi¬ 
plied  polynomials  before,  by  multiplying  each  term  in  A(x)  by  each  term  in  B(x) 
and  then  combining  teims  with  equal  powers.  For  example,  we  can  multiply 
A{x)  =  6x3  +  lx2  —  lOx  +  9  and  B(x )  =  — 2x3  +  4x  —  5  as  follows: 

6x3  +  lx2  —  lOx  +  9 

—  2x3  +  4x  —  5 

—  30x3  —  35x2  +  50x  —  45 
24x4  +  28x3  —  40x2  +  36x 

-  12x6  -  14x5  +  20x4  -  18x3 _ 

-  12x6  -  14xs  +  44x4  -  20x3  -  75x2  +  86x  -  45 
Another  way  to  express  the  product  C(x)  is 

272— 2 

C(x)  =  J2  CjXj  ,  (30.1) 

7=0 

where 

j 

Cj  —  ^  ^  cikbj-k  • 
k= 0 


(30.2) 
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Note  that  degree(C)  =  degree (/l)  +  degree ( B),  implying  that  if  A  is  a  polyno¬ 
mial  of  degree-bound  na  and  B  is  a  polynomial  of  degree-bound  then  C  is  a 
polynomial  of  degree-bound  na  +  «/,  —  1.  Since  a  polynomial  of  degree-bound  k 
is  also  a  polynomial  of  degree-bound  k  +  1 ,  we  will  normally  say  that  the  product 
polynomial  C  is  a  polynomial  of  degree-bound  na  + 

Chapter  outline 

Section  30.1  presents  two  ways  to  represent  polynomials:  the  coefficient  represen¬ 
tation  and  the  point-value  representation.  The  straightforward  methods  for  multi¬ 
plying  polynomials— equations  (30.1)  and  (30.2)— take  &(n2)  time  when  we  rep¬ 
resent  polynomials  in  coefficient  form,  but  only  ©(«)  time  when  we  represent  them 
in  point-value  form.  We  can,  however,  multiply  polynomials  using  the  coefficient 
representation  in  only  ©(/?  lg  /?)  time  by  converting  between  the  two  representa¬ 
tions.  To  see  why  this  approach  works,  we  must  first  study  complex  roots  of  unity, 
which  we  do  in  Section  30.2.  Then,  we  use  the  FFT  and  its  inverse,  also  described 
in  Section  30.2,  to  perform  the  conversions.  Section  30.3  shows  how  to  implement 
the  FFT  quickly  in  both  serial  and  parallel  models. 

This  chapter  uses  complex  numbers  extensively,  and  within  this  chapter  we  use 
the  symbol  i  exclusively  to  denote  V— T. 


30.1  Representing  polynomials 

The  coefficient  and  point- value  representations  of  polynomials  are  in  a  sense  equiv¬ 
alent;  that  is,  a  polynomial  in  point-value  form  has  a  unique  counterpart  in  co¬ 
efficient  form.  In  this  section,  we  introduce  the  two  representations  and  show 
how  to  combine  them  so  that  we  can  multiply  two  degree-bound  n  polynomials 
in  ©(/;  lgn)  time. 

Coefficient  representation 

A  coefficient  representation  of  a  polynomial  A(x)  =  J2j=oajxJ  °f  degree- 
bound  n  is  a  vector  of  coefficients  a  =  (a0.a i, . . . ,  a„_ t).  In  matrix  equations 
in  this  chapter,  we  shall  generally  treat  vectors  as  column  vectors. 

The  coefficient  representation  is  convenient  for  certain  operations  on  polyno¬ 
mials.  For  example,  the  operation  of  evaluating  the  polynomial  A(x)  at  a  given 
point  x0  consists  of  computing  the  value  of  d(x0).  We  can  evaluate  a  polynomial 
in  ©(/?)  time  using  Horner’s  rule. 

A(x o)  =  flo  +  x0(ai  +  Xq{q2  +  •  •  •  +  Xo(fln-2  +  xo{an-\))  •  •  •))  • 
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Similarly,  adding  two  polynomials  represented  by  the  coefficient  vectors  a  = 

(a0,  a\ . an-i)  and  b  =  (b0.bi . bn-\)  takes  0(«)  time:  we  just  produce 

the  coefficient  vector  c  =  (c0,  C\, . . . ,  c„_i),  where  c,  =  a  -,  +  b,  for  j  = 
0,1 -  1. 

Now,  consider  multiplying  two  degree-bound  n  polynomials  A(x)  and  B(x)  rep¬ 
resented  in  coefficient  form.  If  we  use  the  method  described  by  equations  (30.1) 
and  (30.2),  multiplying  polynomials  takes  time  0(n2),  since  we  must  multiply 
each  coefficient  in  the  vector  a  by  each  coefficient  in  the  vector  b.  The  operation 
of  multiplying  polynomials  in  coefficient  form  seems  to  be  considerably  more  diffi¬ 
cult  than  that  of  evaluating  a  polynomial  or  adding  two  polynomials.  The  resulting 
coefficient  vector  c,  given  by  equation  (30.2),  is  also  called  the  convolution  of  the 
input  vectors  a  and  b,  denoted  c  =  a  <g>  b.  Since  multiplying  polynomials  and 
computing  convolutions  are  fundamental  computational  problems  of  considerable 
practical  importance,  this  chapter  concentrates  on  efficient  algorithms  for  them. 

Point-value  representation 

A  point-value  representation  of  a  polynomial  A  (x )  of  degree-bound  n  is  a  set  of 
n  point-value  pairs 

{Oo,  To),  (xi,  v0,  •  •  ■ ,  (x„_ i ,  yn- 1)} 
such  that  all  of  the  xg  are  distinct  and 

yk  =  A(xk)  (30.3) 

for  k  =  0, 1, . . . ,  n  —  1.  A  polynomial  has  many  different  point-value  representa¬ 
tions,  since  we  can  use  any  set  of  n  distinct  points  x0,Xi, . . . ,  x„_i  as  a  basis  for 
the  representation. 

Computing  a  point-value  representation  for  a  polynomial  given  in  coefficient 
form  is  in  principle  straightforward,  since  all  we  have  to  do  is  select  n  distinct 

points  x0,Xi, . . .  ,x„_i  and  then  evaluate  A{xg)  for  A:  =  0,  1 . n  —  1.  With 

Horner’s  method,  evaluating  a  polynomial  at  n  points  takes  time  0(/?2).  We  shall 
see  later  that  if  we  choose  the  points  xg  cleverly,  we  can  accelerate  this  computation 
to  run  in  time  0(/?  lg  n). 

The  inverse  of  evaluation— determining  the  coefficient  form  of  a  polynomial 
from  a  point-value  representation— is  interpolation.  The  following  theorem  shows 
that  interpolation  is  well  defined  when  the  desired  interpolating  polynomial  must 
have  a  degree-bound  equal  to  the  given  number  of  point-value  pairs. 

Theorem  30.1  ( Uniqueness  of  an  interpolating  polynomial) 

For  any  set  {(x0,  To),  (xi,  Ti)>  ■  ■  ■ »  (x„_i,  Tn-i)}  of  n  point-value  pairs  such  that 
all  the  Xg  values  are  distinct,  there  is  a  unique  polynomial  ,4(x)  of  degree-bound  n 
such  that  yg  =  A{xg)  for  k  =  0, 1, . . . ,  n  —  1. 
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Proof  The  proof  relies  on  the  existence  of  the  inverse  of  a  certain  matrix.  Equa¬ 
tion  (30.3)  is  equivalent  to  the  matrix  equation 


Z1 

*0 

Xq  ■ 

••  \ 

{  a0  \ 

/  yo  \ 

1 

At 

x?  ■ 

y.n—  1 

7 

\l 

X„- 1 

Xn- 1  • 

■■  K-J 

\  <2»-l  / 

\  y«- i  / 

The  matrix  on  the  left  is  denoted  V(x0,  X\, . . . ,  x„_i)  and  is  known  as  a  Vander¬ 
monde  matrix.  By  Problem  D-l,  this  matrix  has  determinant 


0 k~Xj)  , 


0<j<k<n—l 


and  therefore,  by  Theorem  D.5,  it  is  invertible  (that  is,  nonsingular)  if  the  Xp  are 
distinct.  Thus,  we  can  solve  for  the  coefficients  a ,  uniquely  given  the  point-value 
representation: 


a  =  V(x0,Xi, . . .  ,x„_i)  ’y  . 


The  proof  of  Theorem  30. 1  describes  an  algorithm  for  interpolation  based  on 
solving  the  set  (30.4)  of  linear  equations.  Using  the  LU  decomposition  algorithms 
of  Chapter  28,  we  can  solve  these  equations  in  time  0(n3). 

A  faster  algorithm  for  /? -point  interpolation  is  based  on  Lagrange’s  formula : 


(30.5) 


You  may  wish  to  verify  that  the  right-hand  side  of  equation  (30.5)  is  a  polynomial 
of  degree-bound  n  that  satisfies  A(xp)  =  yp  for  all  k.  Exercise  30.1-5  asks  you 
how  to  compute  the  coefficients  of  A  using  Lagrange’s  formula  in  time  0(«2). 

Thus,  77 -point  evaluation  and  interpolation  are  well-defined  inverse  operations 
that  transform  between  the  coefficient  representation  of  a  polynomial  and  a  point- 
value  representation.1  The  algorithms  described  above  for  these  problems  take 
time  ©  {n  2 ) . 

The  point-value  representation  is  quite  convenient  for  many  operations  on  poly¬ 
nomials.  For  addition,  if  C (x )  =  A(x)  +  B(x),  then  C(xp)  —  A (xp )  +  B(xp)  for 
any  point  xp.  More  precisely,  if  we  have  a  point-value  representation  for  A, 


1  Interpolation  is  a  notoriously  tricky  problem  from  the  point  of  view  of  numerical  stability.  Although 

the  approaches  described  here  are  mathematically  correct,  small  differences  in  the  inputs  or  round  off 

errors  during  computation  can  cause  large  differences  in  the  result. 
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{Oo,  Jo),  C*l,  Jl),  ■  •  •  ,  Oh- 1 ,  j«-i)}  , 
and  for  5, 

(Oo,  Jo),  (xi,  y[), ....  (x„_i ,  y' _i)} 

(note  that  A  and  B  are  evaluated  at  the  same  n  points),  then  a  point-value  repre¬ 
sentation  for  C  is 

{(■*0,  yo  +  Jo),(*l,  yi  +y[ ) . (Xji— l ,  Jn— 1  +y'n- 1)}  • 

Thus,  the  time  to  add  two  polynomials  of  degree-bound  n  in  point-value  form 
is  ©(«). 

Similarly,  the  point-value  representation  is  convenient  for  multiplying  polyno¬ 
mials.  If  C(x)  =  A(x)B{x),  then  C(x©  =  A(Xk)B{Xk)  for  any  point  Xk,  and 
we  can  pointwise  multiply  a  point-value  representation  for  A  by  a  point-value  rep¬ 
resentation  for  B  to  obtain  a  point-value  representation  for  C.  We  must  face  the 
problem,  however,  that  degree(C)  =  degree  (/I)  +  degree  (/?);  if  A  and  B  are  of 
degree-bound  n ,  then  C  is  of  degree-bound  2 n .  A  standard  point-value  represen¬ 
tation  for  A  and  B  consists  of  n  point-value  pairs  for  each  polynomial.  When  we 
multiply  these  together,  we  get  n  point-value  pairs,  but  we  need  2 n  pairs  to  interpo¬ 
late  a  unique  polynomial  C  of  degree-bound  2 n.  (See  Exercise  30.1-4.)  We  must 
therefore  begin  with  “extended”  point-value  representations  for  A  and  for  B  con¬ 
sisting  of  2/?  point-value  pairs  each.  Given  an  extended  point-value  representation 
for  A, 

{(x0,  Vo),  Oi,  Ji),  ■  ■  ■ ,  fen-i,  ym-i)}  , 

and  a  corresponding  extended  point-value  representation  for  B, 

{(x0,  Jo),  Oi,  t'i).  ■  ■  ■  >  (x2n-i,y,2n_l)}  , 
then  a  point -value  representation  for  C  is 

{(■*0,  Jo  To),  (*1,  Ji  y'x),  •  •  • ,  (X2„-1,  J2»-1  y'm-x)}  ■ 

Given  two  input  polynomials  in  extended  point-value  form,  we  see  that  the  time  to 
multiply  them  to  obtain  the  point-value  form  of  the  result  is  ©(77),  much  less  than 
the  time  required  to  multiply  polynomials  in  coefficient  form. 

Finally,  we  consider  how  to  evaluate  a  polynomial  given  in  point-value  form  at  a 
new  point.  For  this  problem,  we  know  of  no  simpler  approach  than  converting  the 
polynomial  to  coefficient  form  first,  and  then  evaluating  it  at  the  new  point. 

Fast  multiplication  of  polynomials  in  coefficient  form 

Can  we  use  the  linear-time  multiplication  method  for  polynomials  in  point-value 
form  to  expedite  polynomial  multiplication  in  coefficient  form?  The  answer  hinges 
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Coefficient 

representations 


Point  value 
representations 


Figure  30.1  A  graphical  outline  of  an  efficient  polynomial  multiplication  process.  Representations 
on  the  top  are  in  coefficient  form,  while  those  on  the  bottom  are  in  point  value  form.  The  arrows 
from  left  to  right  correspond  to  the  multiplication  operation.  The  a>2n  terms  are  complex  (2/r)th  roots 
of  unity. 

on  whether  we  can  convert  a  polynomial  quickly  from  coefficient  form  to  point- 
value  form  (evaluate)  and  vice  versa  (interpolate). 

We  can  use  any  points  we  want  as  evaluation  points,  but  by  choosing  the  eval¬ 
uation  points  carefully,  we  can  convert  between  representations  in  only  0(/?  Ign) 
time.  As  we  shall  see  in  Section  30.2,  if  we  choose  “complex  roots  of  unity”  as 
the  evaluation  points,  we  can  produce  a  point-value  representation  by  taking  the 
discrete  Fourier  transform  (or  DFT)  of  a  coefficient  vector.  We  can  perform  the 
inverse  operation,  interpolation,  by  taking  the  “inverse  DFT”  of  point-value  pairs, 
yielding  a  coefficient  vector.  Section  30.2  will  show  how  the  FFT  accomplishes 
the  DFT  and  inverse  DFT  operations  in  0(/z  lg/z)  time. 

Figure  30.1  shows  this  strategy  graphically.  One  minor  detail  concerns  degree- 
bounds.  The  product  of  two  polynomials  of  degree-bound  n  is  a  polynomial  of 
degree-bound  2/7.  Before  evaluating  the  input  polynomials  A  and  B,  therefore, 
we  first  double  their  degree-bounds  to  2/?  by  adding  n  high-order  coefficients  of  0. 
Because  the  vectors  have  2 n  elements,  we  use  “complex  (2/7  )th  roots  of  unity,” 
which  are  denoted  by  the  co2n  terms  in  Figure  30.1. 

Given  the  FFT,  we  have  the  following  0(/7  lg  /7)-time  procedure  for  multiplying 
two  polynomials  A(x)  and  B(x)  of  degree-bound  /?,  where  the  input  and  output 
representations  are  in  coefficient  form.  We  assume  that  n  is  a  power  of  2;  we  can 
always  meet  this  requirement  by  adding  high-order  zero  coefficients. 

1.  Double  degree-bound:  Create  coefficient  representations  of  A  (x  )  and  B(x  )  as 
degree-bound  2 n  polynomials  by  adding  n  high-order  zero  coefficients  to  each. 
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2.  Evaluate:  Compute  point-value  representations  of  A(x)  and  B(x)  of  length  2n 
by  applying  the  FFT  of  order  2 n  on  each  polynomial.  These  representations 
contain  the  values  of  the  two  polynomials  at  the  (2«)th  roots  of  unity. 

3.  Pointwise  multiply:  Compute  a  point-value  representation  for  the  polynomial 
C(x)  =  A(x)B(x)  by  multiplying  these  values  together  pointwise.  This  repre¬ 
sentation  contains  the  value  of  C(x)  at  each  (2n)th  root  of  unity. 

4.  Interpolate:  Create  the  coefficient  representation  of  the  polynomial  C(x)  by 
applying  the  FFT  on  2 n  point-value  pairs  to  compute  the  inverse  DFT. 

Steps  (1)  and  (3)  take  time  0(»),  and  steps  (2)  and  (4)  take  time  0(«  lg  n).  Thus, 

once  we  show  how  to  use  the  FFT,  we  will  have  proven  the  following. 

Theorem  30.2 

We  can  multiply  two  polynomials  of  degree-bound  n  in  time  @(«  lg  «),  with  both 

the  input  and  output  representations  in  coefficient  form.  ■ 


Exercises 


30.1-1 

Multiply  the  polynomials  A{x)  =  7x3  —  x2  +  x  —  10  and  B(x)  =  8x3  —  6x  +  3 
using  equations  (30.1)  and  (30.2). 


30.1-2 

Another  way  to  evaluate  a  polynomial  A{x)  of  degree-bound  n  at  a  given  point  x0 
is  to  divide  A{x)  by  the  polynomial  (x  —  x0),  obtaining  a  quotient  polynomial  q (x ) 
of  degree-bound  n  —  1  and  a  remainder  r,  such  that 

A{x)  =  q(x)(x  —  x0)  +  r. 

Clearly,  A{x0)  =  r.  Show  how  to  compute  the  remainder  r  and  the  coefficients 
of  q{x)  in  time  0(«)  from  x0  and  the  coefficients  of  A. 


30.1- 3 

Derive  a  point-value  representation  for  A rcv  (x )  =  Ylj=oan-i-jXJ  from  a  point- 
value  representation  for  A{x)  =  X^j=o  ajxJ\  assuming  that  none  of  the  points  is  0. 

30.1- 4 

Prove  that  n  distinct  point-value  pairs  are  necessary  to  uniquely  specify  a  polyno¬ 
mial  of  degree-bound  n,  that  is,  if  fewer  than  n  distinct  point- value  pairs  are  given, 
they  fail  to  specify  a  unique  polynomial  of  degree-bound  n.  {Hint:  Using  Theo¬ 
rem  30. 1 ,  what  can  you  say  about  a  set  of  n  —  1  point-value  pairs  to  which  you  add 
one  more  arbitrarily  chosen  point-value  pair?) 
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30.1- 5 

Show  how  to  use  equation  (30.5)  to  interpolate  in  time  @(n2).  {Hint:  First  compute 
the  coefficient  representation  of  the  polynomial  n.  (x  —  Xj)  and  then  divide  by 
(x  —  Xfc)  as  necessary  for  the  numerator  of  each  term;  see  Exercise  30.1-2.  You  can 
compute  each  of  the  n  denominators  in  time  O (//).) 

30.1- 6 

Explain  what  is  wrong  with  the  “obvious”  approach  to  polynomial  division  using 
a  point-value  representation,  i.e.,  dividing  the  corresponding  y  values.  Discuss 
separately  the  case  in  which  the  division  comes  out  exactly  and  the  case  in  which 
it  doesn’t. 


30.1-7 

Consider  two  sets  A  and  B,  each  having  n  integers  in  the  range  from  0  to  I  On.  We 
wish  to  compute  the  Cartesian  sum  of  A  and  B ,  defined  by 

C  =  {x  +  y  \  x  €  A  and  y  e  B)  . 

Note  that  the  integers  in  C  are  in  the  range  from  0  to  20/? .  We  want  to  find  the 
elements  of  C  and  the  number  of  times  each  element  of  C  is  realized  as  a  sum  of 
elements  in  A  and  B.  Show  how  to  solve  the  problem  in  0{n  lgn)  time.  {Hint: 
Represent  A  and  B  as  polynomials  of  degree  at  most  10/;.) 


30.2  The  DFT  and  FFT 

In  Section  30.1,  we  claimed  that  if  we  use  complex  roots  of  unity,  we  can  evaluate 
and  interpolate  polynomials  in  @(n  Ig  n )  time.  In  this  section,  we  define  complex 
roots  of  unity  and  study  their  properties,  define  the  DFT,  and  then  show  how  the 
FFT  computes  the  DFT  and  its  inverse  in  0(n  lg  n)  time. 

Complex  roots  of  unity 

A  complex  nth  root  of  unity  is  a  complex  number  to  such  that 

to"  =  1  . 

There  are  exactly  n  complex  nth  roots  of  unity:  e2Klk for  k  =  0.1,..../?—  I . 
To  interpret  this  formula,  we  use  the  definition  of  the  exponential  of  a  complex 
number: 

e,u  =  cos (u)  +  i  sin(w)  . 

Figure  30.2  shows  that  the  n  complex  roots  of  unity  are  equally  spaced  around  the 
circle  of  unit  radius  centered  at  the  origin  of  the  complex  plane.  The  value 
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Figure  30.2  The  values  of  a>g ,  aig . in  the  complex  plane,  where  a>8  =  e2"1 *'8  is  the  prin 

cipal  8th  root  of  unity. 

(On  =  elni/n  (30.6) 

is  the  principal  nth  root  of  unity-,2  all  other  complex  nth  roots  of  unity  are  powers 
of  (on. 

The  n  complex  nth  roots  of  unity, 

, ,o  ,  ,t  , ,/i— t 

(On,(On,...,(On  , 

form  a  group  under  multiplication  (see  Section  31.3).  This  group  has  the  same 
structure  as  the  additive  group  (Z„,  +)  modulo  n,  since  a>H  =  a>°  =  1  implies  that 
(°la)n  =  (°i+k  =  a)^+k^moin.  Similarly,  co~l  =  cu"~*.  The  following  lemmas 
furnish  some  essential  properties  of  the  complex  nth  roots  of  unity. 


Lemma  30.3  ( Cancellation  lemma) 

For  any  integers  n  >  0,  k  >  0,  and  d  >  0, 

<n  =  eokn  .  (30.7) 


Proof  The  lemma  follows  directly  from  equation  (30.6),  since 

<  =  {*2*i/dn)dk 
_ 


2Many  other  authors  define  a>„  differently:  co„  =  e~2ni! n.  This  alternative  definition  tends  to  be 

used  for  signal  processing  applications.  The  underlying  mathematics  is  substantially  the  same  with 

either  definition  of  con . 
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Corollary  30.4 

For  any  even  integer  n  >  0, 


Proof  The  proof  is  left  as  Exercise  30.2-1. 


Lemma  30.5  ( Halving  lemma) 

If  n  >  0  is  even,  then  the  squares  of  the  n  complex  nth  roots  of  unity  are  the  n/ 2 
complex  («/2)th  roots  of  unity. 

Proof  By  the  cancellation  lemma,  we  have  (of)2  =  for  any  nonnegative 
integer  k.  Note  that  if  we  square  all  of  the  complex  /Oh  roots  of  unity,  then  we 
obtain  each  («/2)th  root  of  unity  exactly  twice,  since 


=  K)2- 


Thus,  of  and  of1  "/2  have  the  same  square.  We  could  also  have  used  Corol¬ 
lary  30.4  to  prove  this  property,  since  off 2  =  —  1  implies  of+n'2  =  —cof  and 


thus  ( <u*+" I2)2  =  (atf2. 


As  we  shall  see,  the  halving  lemma  is  essential  to  our  divide-and-conquer  ap¬ 
proach  for  converting  between  coefficient  and  point- value  representations  of  poly¬ 
nomials,  since  it  guarantees  that  the  recursive  subproblems  are  only  half  as  large. 

Lemma  30.6  ( Summation  lemma) 

For  any  integer  n  >  1  and  nonzero  integer  k  not  divisible  by  n , 


j= o 


Proof  Equation  (A.5)  applies  to  complex  values  as  well  as  to  reals,  and  so  we 
have 
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(03kn)n  ~  1 

~  1 

K)*  - 1 

(l)fc-  1 

- 1 

0. 

Because  we  require  that  k  is  not  divisible  by  n,  and  because  cok  =  1  only  when  k 
is  divisible  by  n ,  we  ensure  that  the  denominator  is  not  0.  ■ 

The  DFT 

Recall  that  we  wish  to  evaluate  a  polynomial 

n— 1 

A(x )  =  y  a  jX  J 
j= o 

of  degree-bound  n  at  co°,  co„, . . . ,  (that  is,  at  the  n  complex  nth  roots  of 
unity).3  We  assume  that  A  is  given  in  coefficient  form:  a  =  (a0,a i , . . . ,  an- 1).  Let 
us  define  the  results  for  k  =  0, 1, . . . ,  n  —  1,  by 

yk  = 

72—1 

=  yajojkj  .  (30.8) 

7  =  0 

The  vector  y  =  (  y0.  Vi , . . . ,  yn- 1)  is  the  discrete  Fourier  transform  (DFT)  of  the 
coefficient  vector  a  =  (a0,  a\ , . . . ,  We  also  write  y  =  DFT„ (a). 

The  FFT 

By  using  a  method  known  as  the  fast  Fourier  transform  (FFT),  which  takes  ad¬ 
vantage  of  the  special  properties  of  the  complex  roots  of  unity,  we  can  compute 
DFT„(fl)  in  time  0(/ilg/i),  as  opposed  to  the  0(«2)  time  of  the  straightforward 
method.  We  assume  throughout  that  n  is  an  exact  power  of  2.  Although  strategies 


72  —  1 


k\J 


E  M) 


7=0 


3The  length  n  is  actually  what  we  referred  to  as  2 n  in  Section  30.1,  since  we  double  the  degree  bound 
of  the  given  polynomials  prior  to  evaluation.  In  the  context  of  polynomial  multiplication,  therefore, 
we  are  actually  working  with  complex  (2/z)th  roots  of  unity. 
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for  dealing  with  non-power-of-2  sizes  are  known,  they  are  beyond  the  scope  of  this 
book. 

The  FFT  method  employs  a  divide-and-conquer  strategy,  using  the  even-indexed 
and  odd-indexed  coefficients  of  A(x)  separately  to  define  the  two  new  polynomials 
^°l(x)  and  ,4 11 J (x )  of  degree-bound  n/2: 

A^{x)  =  a0  +  a2x  +  aAx2  +  ■  ■  ■  +  an-2xnl2~l  , 

A^(x)  =  cii  +  a2x  +  a5x2  +  ■  ■  ■  +  a„-ixn/2~l  . 

Note  that  ,4|0'  contains  all  the  even-indexed  coefficients  of  A  (the  binary  represen¬ 
tation  of  the  index  ends  in  0)  and  A contains  all  the  odd-indexed  coefficients  (the 
binary  representation  of  the  index  ends  in  1).  It  follows  that 

A(x)  =  Al0](x2)  +  xA[1](x2)  ,  (30.9) 

so  that  the  problem  of  evaluating  A(x)  at  <z>°,  a>l , . . . ,  reduces  to 

1.  evaluating  the  degree-bound  n/2  polynomials  A ^ (x )  and  A^'Hx)  at  the  points 

((o°n)2,  (coj,)2, K-1)2  .  (30.10) 

and  then 

2.  combining  the  results  according  to  equation  (30.9). 

By  the  halving  lemma,  the  list  of  values  (30.10)  consists  not  of  n  distinct  val¬ 
ues  but  only  of  the  n/ 2  complex  {n  / 2)  th  roots  of  unity,  with  each  root  occurring 
exactly  twice.  Therefore,  we  recursively  evaluate  the  polynomials  A*-0*  and  A  1 
of  degree-bound  nil  at  the  n/2  complex  («/2)th  roots  of  unity.  These  subprob¬ 
lems  have  exactly  the  same  form  as  the  original  problem,  but  are  half  the  size. 
We  have  now  successfully  divided  an  n -element  DFT„  computation  into  two  n/2- 
element  DFT„/2  computations.  This  decomposition  is  the  basis  for  the  follow¬ 
ing  recursive  FFT  algorithm,  which  computes  the  DFT  of  an  n  -element  vector 
a  =  (do ,  <3 i  i  . . . ,  a„- 1),  where  n  is  a  power  of  2. 
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Recursive-FFT  (a) 
1  n  =  a.  length 


II  n  is  a  power  of  2 


2  if  n  ==  1 


3  return  a 


4  con  =  elni'n 

5  (o  =  l 

6  <2[01  =  (a0.a2, . . .  ,an- 2) 

7  a[1]  =  (ai,a3, . . .  ,a„_i) 

8  y[0]  =  Recursive-FFT  (a[01) 

9  y[1]  =  Recursive-FFT  {a^) 
10  for  k  =  Oton/2  —  1 


12  yk 


1  T  [0]  [1] 

12  Tk+c/2)  =  -  (0  y\ 


14  return  y 


II  y  is  assumed  to  be  a  column  vector 


The  Recursive-FFT  procedure  works  as  follows.  Lines  2-3  represent  the  basis 
of  the  recursion;  the  DFT  of  one  element  is  the  element  itself,  since  in  this  case 


Lines  6-7  define  the  coefficient  vectors  for  the  polynomials  .4 101  and  Lines 
4,  5,  and  13  guarantee  that  co  is  updated  properly  so  that  whenever  lines  11-12 
are  executed,  we  have  co  =  co„.  (Keeping  a  running  value  of  co  from  iteration 
to  iteration  saves  time  over  computing  co„  from  scratch  each  time  through  the  for 
loop.)  Lines  8-9  perform  the  recursive  DFT„/2  computations,  setting,  for  k  = 
0,1,..., «/2-  1, 


or,  since  o>^2  =  co„k  by  the  cancellation  lemma, 


yf  =  /!'>?), 

4"  =  )  . 
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Lines  11-12  combine  the  results  of  the  recursive  DFT„/2  calculations.  For  yQ,  ylt 
. .  ,,yn/ 2_i,  line  11  yields 

_  ,,[°1  ,  ,V,,[1] 

yk  -  yk  +  conyk 

=  Am{(»lk)  +  c okAll](a)2k ) 

=  A{u)k)  (by  equation  (30.9))  . 


For  yn/2,yn/2+i,  ••• ,  Vn-i,  letting  k  =  0,  1, - nf  2-  l,line  12yields 


ykHn/2) 


yf  +  ^+(n/2)y[11  (since  <y^+(”/2)  =  —  cok) 

A[0\co2k)  +  cok+(n/2)AM((o2k) 

Al0](co2k+n)  +  a>k+{n,2) A[l ]{a>lk+n)  (since  oolk+n  =  (02k) 
A(a)k+(nl2) )  (by  equation  (30.9))  . 


Thus,  the  vector  y  returned  by  RECURSIVE-FFT  is  indeed  the  DFT  of  the  input 
vector  a. 

Lines  11  and  12  multiply  each  value  y[.' 1  by  <ok ,  lor  k  =  0,  L . . . ,  n/2  —  1. 
Line  11  adds  this  product  to  and  line  12  subtracts  it.  Because  we  use  each 
factor  u>k  in  both  its  positive  and  negative  forms,  we  call  the  factors  u)k  twiddle 
factors. 

To  determine  the  running  time  of  procedure  RECURSIVE-FFT,  we  note  that 
exclusive  of  the  recursive  calls,  each  invocation  takes  time  0(/i),  where  n  is  the 
length  of  the  input  vector.  The  recurrence  for  the  running  time  is  therefore 


T(n)  =  2T(n/2)  +  @(n) 
=  &(n  lg  n)  . 


Thus,  we  can  evaluate  a  polynomial  of  degree-bound  n  at  the  complex  nth  roots  of 
unity  in  time  0(n  lgn)  using  the  fast  Fourier  transform. 


Interpolation  at  the  complex  roots  of  unity 

We  now  complete  the  polynomial  multiplication  scheme  by  showing  how  to  in¬ 
terpolate  the  complex  roots  of  unity  by  a  polynomial,  which  enables  us  to  convert 
from  point-value  form  back  to  coefficient  form.  We  interpolate  by  writing  the  DFT 
as  a  matrix  equation  and  then  looking  at  the  form  of  the  matrix  inverse. 

From  equation  (30.4),  we  can  write  the  DFT  as  the  matrix  product  y  =  V„ a , 
where  Vn  is  a  Vandermonde  matrix  containing  the  appropriate  powers  of  a>„ : 


30.2  The  DFT  and  FFT 
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1  \ 

/  a0  \ 

,  .n  —  l 

Mn 

a  1 

M  2(n— 1) 

a2 

a3 

(n-l)(n-l)  / 

\  &n—l  f 

The  (k,  j )  entry  of  V„  is  00$ ,  for  j.  k  =  0,  1, . . . ,  n  —  1.  The  exponents  of  the 
entries  of  Vn  form  a  multiplication  table. 

For  the  inverse  operation,  which  we  write  as  a  —  DFT”1  (y),  we  proceed  by 
multiplying  y  by  the  matrix  V~\  the  inverse  of  Vn. 


Theorem  30.7 

For  j,  k  =  0, 1 _ _  n  —  1,  the  (j,  k )  entry  of  V~x  is  a>~kj /n. 


Proof  We  show  that  Vn  1  Vn  =  /„ ,  the  n  x  n  identity  matrix.  Consider  the  (J,  j') 
entry  of  Vf 1  Vn : 

n— 1 

k= 0 

k= 0 

This  summation  equals  1  if  j  ’  —  j ,  and  it  is  0  otherwise  by  the  summation  lemma 
(Lemma  30.6).  Note  that  we  rely  on  —(n  —  1)  <  j'  —  j  <  n  —  1,  so  that  j'  —  j  is 
not  divisible  by  n,  in  order  for  the  summation  lemma  to  apply.  ■ 


Given  the  inverse  matrix  Vn  1 ,  we  have  that  DFTn  1  (y )  is  given  by 


..  n— 1 


kj 


(30.11) 


k— 0 


for  j  =  0,  1 . n  —  1.  By  comparing  equations  (30.8)  and  (30.11),  we  see  that 

by  modifying  the  FFT  algorithm  to  switch  the  roles  of  a  and  y,  replace  con  by  at”1, 
and  divide  each  element  of  the  result  by  n,  we  compute  the  inverse  DFT  (see  Ex¬ 
ercise  30.2-4).  Thus,  we  can  compute  DFT”1  in  ®(n  lg  n)  time  as  well. 

We  see  that,  by  using  the  FFT  and  the  inverse  FFT,  we  can  transform  a  poly¬ 
nomial  of  degree-bound  n  back  and  forth  between  its  coefficient  representation 
and  a  point- value  representation  in  time  ©(/?  lg  n  ).  In  the  context  of  polynomial 
multiplication,  we  have  shown  the  following. 
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Theorem  30.8  (Convolution  theorem) 

For  any  two  vectors  a  and  b  of  length  n,  where  n  is  a  power  of  2, 
a  ®  b  =  DFTjj  (DFT2„(a)  ■  DFT 2n(b))  , 

where  the  vectors  a  and  h  are  padded  with  Os  to  length  2 n  and  •  denotes  the  com¬ 
ponentwise  product  of  two  2/j -element  vectors.  ■ 


Exercises 


30.2-1 

Prove  Corollary  30.4. 


30.2-2 

Compute  the  DFT  of  the  vector  (0, 1,2,  3). 


30.2-3 

Do  Exercise  30.1-1  by  using  the  0( n  lg  «)-time  scheme. 


30.2-4 

Write  pseudocode  to  compute  DFT'1  in  @(n  lg  n)  time. 


30.2- 5 

Describe  the  generalization  of  the  FFT  procedure  to  the  case  in  which  n  is  a  power 
of  3.  Give  a  recurrence  for  the  running  time,  and  solve  the  recurrence. 

30.2- 6  * 

Suppose  that  instead  of  performing  an  n -element  FFT  over  the  field  of  complex 
numbers  (where  n  is  even),  we  use  the  ring  Zm  of  integers  modulo  m,  where 
m  =  2'"/2  +  1  and  t  is  an  arbitrary  positive  integer.  Use  a>  =  2‘  instead  of  a>„ 
as  a  principal  nth  root  of  unity,  modulo  m.  Prove  that  the  DFT  and  the  inverse  DFT 
are  well  defined  in  this  system. 


30.2- 7 

Given  a  list  of  values  Zo,Zi, ... ,  Zn- i  (possibly  with  repetitions),  show  how  to  find 
the  coefficients  of  a  polynomial  P(x)  of  degree-bound  n  +  1  that  has  zeros  only 
at  Zo,  Z\, . .  ■ ,  Zn- i  (possibly  with  repetitions).  Your  procedure  should  run  in  time 
0(n  lg2  n).  (Hint:  The  polynomial  P(x)  has  a  zero  at  z.j,  if  and  only  if  P(x)  is  a 
multiple  of  (x  —  Zj ).) 

30.2- 8  * 

The  chirp  transform  of  a  vector  a  =  (aQ,a\, . . .  ,an-f)  is  the  vector  y  = 
(  v0.  Vi, . . . ,  yn- 1),  where  Vk  =  Xw=o  ajZkj  and  z  is  any  complex  number.  The 
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DFT  is  therefore  a  special  case  of  the  chirp  transform,  obtained  by  taking  z  =  con. 
Show  how  to  evaluate  the  chirp  transform  in  time  0(n  lg  n)  for  any  complex  num¬ 
ber  z.  (Hint:  Use  the  equation 

yk  =  £  (ajz*2)  (z-*-»212) 

7=0 

to  view  the  chirp  transform  as  a  convolution.) 


30.3  Efficient  FFT  implementations 

Since  the  practical  applications  of  the  DFT,  such  as  signal  processing,  demand  the 
utmost  speed,  this  section  examines  two  efficient  FFT  implementations.  First,  we 
shall  examine  an  iterative  version  of  the  FFT  algorithm  that  runs  in  <d(n  lg  n)  time 
but  can  have  a  lower  constant  hidden  in  the  ©-notation  than  the  recursive  version 
in  Section  30.2.  (Depending  on  the  exact  implementation,  the  recursive  version 
may  use  the  hardware  cache  more  efficiently.)  Then,  we  shall  use  the  insights  that 
led  us  to  the  iterative  implementation  to  design  an  efficient  parallel  FFT  circuit. 

An  iterative  FFT  implementation 

We  first  note  that  the  for  loop  of  lines  10-13  of  RECURSIVE-FFT  involves  com¬ 
puting  the  value  co„  y|' 1  twice.  In  compiler  terminology,  we  call  such  a  value  a 
common  subexpression.  We  can  change  the  loop  to  compute  it  only  once,  storing 
it  in  a  temporary  variable  t. 

for  k  =  0  to  n /2  -  1 

*  =  -t11] 

yk  =  yf  + 1 
[0] 

yk+(n/2)  =  yk~t 

CO  =  COCOn 

The  operation  in  this  loop,  multiplying  the  twiddle  factor  co  =  co^  by  storing 
the  product  into  t,  and  adding  and  subtracting  t  from  yj,0^ ,  is  known  as  a  butterfly 
operation  and  is  shown  schematically  in  Figure  30.3. 

We  now  show  how  to  make  the  FFT  algorithm  iterative  rather  than  recursive 
in  structure.  In  Figure  30.4,  we  have  arranged  the  input  vectors  to  the  recursive 
calls  in  an  invocation  of  RECURSIVE-FFT  in  a  tree  structure,  where  the  initial 
call  is  for  n  =  8.  The  tree  has  one  node  for  each  call  of  the  procedure,  labeled 
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Figure  30 3  A  butterfly  operation,  (a)  The  two  input  values  enter  from  the  left,  the  twiddle  fee 
tor  (0%  is  multiplied  by  and  the  sum  and  difference  are  output  on  the  right,  (b)  A  simplified 
drawing  of  a  butterfly  operation.  We  will  use  this  representation  in  a  parallel  FFT  circuit. 


Figure30.4  The  tree  of  input  vectors  to  the  recursive  calls  of  the  RECURSIVE  FFT  procedure.  The 
initial  invocation  is  for  n  =  8. 

by  the  corresponding  input  vector.  Each  Recursive-FFT  invocation  makes  two 
recursive  calls,  unless  it  has  received  a  1-element  vector.  The  first  call  appears  in 
the  left  child,  and  the  second  call  appears  in  the  right  child. 

Looking  at  the  tree,  we  observe  that  if  we  could  arrange  the  elements  of  the 
initial  vector  a  into  the  order  in  which  they  appear  in  the  leaves,  we  could  trace 
the  execution  of  the  Recursive-FFT  procedure,  but  bottom  up  instead  of  top 
down.  First,  we  take  the  elements  in  pairs,  compute  the  DFT  of  each  pair  using 
one  butterfly  operation,  and  replace  the  pair  with  its  DFT.  The  vector  then  holds 
n/ 2  2-element  DFTs.  Next,  we  take  these  n/2  DFTs  in  pairs  and  compute  the 
DFT  of  the  four  vector  elements  they  come  from  by  executing  two  butterfly  oper¬ 
ations,  replacing  two  2-element  DFTs  with  one  4-element  DFT.  The  vector  then 
holds  n/4  4-element  DFTs.  We  continue  in  this  manner  until  the  vector  holds  two 
(n/2)-element  DFTs,  which  we  combine  using  n/2  butterfly  operations  into  the 
final  //-element  DFT. 

To  turn  this  bottom-up  approach  into  code,  we  use  an  array  j4[0.  .n  —  1]  that 
initially  holds  the  elements  of  the  input  vector  a  in  the  order  in  which  they  appear 
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in  the  leaves  of  the  tree  of  Figure  30.4.  (We  shall  show  later  how  to  determine  this 
order,  which  is  known  as  a  bit-reversal  permutation.)  Because  we  have  to  combine 
DFTs  on  each  level  of  the  tree,  we  introduce  a  variable  s  to  count  the  levels,  ranging 
from  1  (at  the  bottom,  when  we  are  combining  pairs  to  form  2-element  DFTs) 
to  lg  n  (at  the  top,  when  we  are  combining  two  (n /2)-element  DFTs  to  produce  the 
final  result).  The  algorithm  therefore  has  the  following  structure: 

1  for  s  =  1  to  lg  n 

2  for  k  =  0  to  n  —  1  by  2s 

3  combine  the  two  2s~l  -element  DFTs  in 

A[k..k  +  2s -1  -  1]  and  A[k  +  2S“‘  . .  k  +  2s  -  1] 
into  one  2s -element  DFT  in  A[k  . .  k  +  2s  —  1] 

We  can  express  the  body  of  the  loop  (line  3)  as  more  precise  pseudocode.  We 
copy  the  for  loop  from  the  Recursive-FFT  procedure,  identifying  y ^  with 
A[k  . .  k  +  2s~l  —  1]  and  with  A[k  +  2S_1  . .  k  +  2s  —  1],  The  twiddle  fac¬ 
tor  used  in  each  butterfly  operation  depends  on  the  value  of  s ;  it  is  a  power  of  com, 
where  m  =  2s .  (We  introduce  the  variable  m  solely  for  the  sake  of  readability.) 
We  introduce  another  temporary  variable  u  that  allows  us  to  perform  the  butterfly 
operation  in  place.  When  we  replace  line  3  of  the  overall  structure  by  the  loop 
body,  we  get  the  following  pseudocode,  which  forms  the  basis  of  the  parallel  im¬ 
plementation  we  shall  present  later.  The  code  first  calls  the  auxiliary  procedure 
Bit-Reverse-Copy  {a,  A)  to  copy  vector  a  into  array  A  in  the  initial  order  in 
which  we  need  the  values. 

Iterative-FFT  (a) 

1  Bit-Reverse-Copy  (a,  A) 

2  n  =  a. length  II  n  is  a  power  of  2 

3  for  s  =  1  to  lg  n 

4  m  =  2s 

5  com  =  e2ni/m 

6  for  k  =  0  to  n  —  1  by  m 

7  co  =  1 

8  for  j  =  0  to  m  /2  —  1 

9  t  —  co  A  [k  +  j  +  m  /  2] 

10  u  =  A[k  +  j] 

11  A\k  j\  —  u  1 

12  A[k  +  j  +  m/2\  =  u  —  t 

13  oo  —  ooaom 

14  return  A 

How  does  Bit-Reverse-Copy  get  the  elements  of  the  input  vector  a  into  the 
desired  order  in  the  array  A2  The  order  in  which  the  leaves  appeal-  in  Figure  30.4 
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is  a  bit-reversal  permutation.  That  is,  if  we  let  rev(k)  be  the  Ig  //-bit  integer 
formed  by  reversing  the  bits  of  the  binary  representation  of  k,  then  we  want  to 
place  vector  element  in  array  position  A  [rev(/c)J.  In  Figure  30.4,  for  exam¬ 
ple,  the  leaves  appear-  in  the  order  0, 4,2,6,  1,5,  3,  7;  this  sequence  in  binary  is 
000,  100,010,  110,001,  101,011,  111,  and  when  we  reverse  the  bits  of  each  value 
we  get  the  sequence  000,  001, 010, 011,  100,  101,  1 10,  111.  To  see  that  we  want  a 
bit-reversal  permutation  in  general,  we  note  that  at  the  top  level  of  the  tree,  indices 
whose  low-order  bit  is  0  go  into  the  left  subtree  and  indices  whose  low-order  bit 
is  1  go  into  the  right  subtree.  Stripping  off  the  low-order  bit  at  each  level,  we  con¬ 
tinue  this  process  down  the  tree,  until  we  get  the  order  given  by  the  bit-reversal 
permutation  at  the  leaves. 

Since  we  can  easily  compute  the  function  rev(7t),  the  Bit-Reverse-Copy  pro¬ 
cedure  is  simple: 

Bit-Reverse-Copy  (a.  A) 

1  n  =  a. length 

2  for  k  =  0  to  n  —  1 

3  A[rev(k)]  = 

The  iterative  FFT  implementation  runs  in  time  0(/?lg/?).  The  call  to  Bit- 
REVERSE-COPY(a,  A)  certainly  runs  in  0(n  lg n)  time,  since  we  iterate  n  times 
and  can  reverse  an  integer  between  0  and  n  —  1,  with  lg  n  bits,  in  O ( 1  g  //)  time. 
(In  practice,  because  we  usually  know  the  initial  value  of  n  in  advance,  we  would 
probably  code  a  table  mapping  k  to  rev(k),  making  Bit-Reverse-Copy  run  in 
0(»  )  time  with  a  low  hidden  constant.  Alternatively,  we  could  use  the  clever  amor¬ 
tized  reverse  binary  counter  scheme  described  in  Problem  17-1.)  To  complete  the 
proof  that  Iterative-FFT  runs  in  time  0 (/?  lg  n),  we  show  that  L(n),  the  number 
of  times  the  body  of  the  innermost  loop  (lines  8-13)  executes,  is  0(n  lg/?)-  The 
for  loop  of  lines  6-13  iterates  n/m  =  n/2s  times  for  each  value  of  s,  and  the 
innermost  loop  of  lines  8-13  iterates  m/2  =  2s  times.  Thus, 


lg  n 


Un) 


S=1 

lg  n 


©(//  lg  n)  . 
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Figure  30.5  A  circuit  that  computes  the  FFT  in  parallel,  here  shown  on  n  =  8  inputs.  Each 
butterfly  operation  takes  as  input  the  values  on  two  wires,  along  with  a  twiddle  factor,  and  it  produces 
as  outputs  the  values  on  two  wires.  The  stages  of  butterflies  are  labeled  to  correspond  to  iterations 
of  the  outermost  loop  of  the  ITERATIVE  FFT  procedure.  Only  the  top  and  bottom  wires  passing 
through  a  butterfly  interact  with  it;  wires  that  pass  through  the  middle  of  a  butterfly  do  not  affect 
that  butterfly,  nor  are  their  values  changed  by  that  butterfly.  For  example,  the  top  butterfly  in  stage  2 
has  nothing  to  do  with  wire  1  (the  wire  whose  output  is  labeled  y  i );  its  inputs  and  outputs  are  only 
on  wires  0  and  2  (labeled  yo  and  y>2,  respectively).  This  circuit  has  depth  0(lgn)  and  performs 
0(«  lg  n)  butterfly  operations  altogether. 


A  parallel  FFT  circuit 

We  can  exploit  many  of  the  properties  that  allowed  us  to  implement  an  efficient 
iterative  FFT  algorithm  to  produce  an  efficient  parallel  algorithm  for  the  FFT.  We 
will  express  the  parallel  FFT  algorithm  as  a  circuit.  Figure  30.5  shows  a  parallel 
FFT  circuit,  which  computes  the  FFT  on  n  inputs,  for  n  =  8.  The  circuit  begins 
with  a  bit-reverse  permutation  of  the  inputs,  followed  by  lg  n  stages,  each  stage 
consisting  of  n/ 2  butterflies  executed  in  parallel.  The  depth  of  the  circuit— the 
maximum  number  of  computational  elements  between  any  output  and  any  input 
that  can  reach  it— is  therefore  0(lg  n). 

The  leftmost  part  of  the  parallel  FFT  circuit  performs  the  bit-reverse  permuta¬ 
tion,  and  the  remainder  mimics  the  iterative  ITERATIVE-FFT  procedure.  Because 
each  iteration  of  the  outermost  for  loop  performs  n /2  independent  butterfly  opera¬ 
tions,  the  circuit  performs  them  in  parallel.  The  value  of  s  in  each  iteration  within 
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Iterative-FFT  corresponds  to  a  stage  of  butterflies  shown  in  Figure  30.5.  For 
s  =  1,2, . . .  ,lgn,  stage  s  consists  of  n/ 2s  groups  of  butterflies  (corresponding  to 
each  value  of  k  in  Iterative-FFT),  with  2,v~ 1  butterflies  per  group  (corresponding 
to  each  value  of  j  in  Iterative-FFT).  The  butterflies  shown  in  Figure  30.5  corre¬ 
spond  to  the  butterfly  operations  of  the  innermost  loop  (lines  9-12  of  Iterative- 
FFT).  Note  also  that  the  twiddle  factors  used  in  the  butterflies  correspond  to  those 
used  in  Iterative-FFT:  in  stage  s,  we  use  ,  cu^, . . . ,  co ™J2~X ,  where  m  =  2s. 

Exercises 


30.3-1 

Show  how  Iterative-FFT  computes  the  DFT  of  the  input  vector  (0, 2, 3,  —1, 4, 
5,7,  9). 


30.3-2 

Show  how  to  implement  an  FFT  algorithm  with  the  bit-reversal  permutation  occur¬ 
ring  at  the  end,  rather  than  at  the  beginning,  of  the  computation.  {Hint:  Consider 
the  inverse  DFT.) 


30.3- 3 

How  many  times  does  Iterative-FFT  compute  twiddle  factors  in  each  stage? 
Rewrite  Iterative-FFT  to  compute  twiddle  factors  only  2t_1  times  in  stage  s. 

30.3- 4  * 

Suppose  that  the  adders  within  the  butterfly  operations  of  the  FFT  circuit  some¬ 
times  fail  in  such  a  manner  that  they  always  produce  a  zero  output,  independent 
of  their  inputs.  Suppose  that  exactly  one  adder  has  failed,  but  that  you  don’t  know 
which  one.  Describe  how  you  can  identify  the  failed  adder  by  supplying  inputs  to 
the  overall  FFT  circuit  and  observing  the  outputs.  How  efficient  is  your  method? 


Problems 


30-1  Divide-and-conquer  multiplication 

a.  Show  how  to  multiply  two  linear  polynomials  ax  +  b  and  ex  +  d  using  only 
three  multiplications.  {Hint:  One  of  the  multiplications  is  {a  +  b)  ■  {c  +  d).) 

b.  Give  two  divide-and-conquer  algorithms  for  multiplying  two  polynomials  of 
degree-bound  n  in  0(«lg3)  time.  The  first  algorithm  should  divide  the  input 
polynomial  coefficients  into  a  high  half  and  a  low  half,  and  the  second  algorithm 
should  divide  them  according  to  whether  their  index  is  odd  or  even. 
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c.  Show  how  to  multiply  two  « -bit  integers  in  0(«lg3)  steps,  where  each  step 
operates  on  at  most  a  constant  number  of  1-bit  values. 


30-2  Toeplitz  matrices 

A  Toeplitz  matrix  is  an  n  x  n  matrix  A  =  (ay)  such  that  ay  =  a;_iy_i  for 

i  =  2,  3, . . .  ,n  and  j  =2,3 _ ,/z. 

a.  Is  the  sum  of  two  Toeplitz  matrices  necessarily  Toeplitz?  What  about  the  prod¬ 
uct? 

b.  Describe  how  to  represent  a  Toeplitz  matrix  so  that  you  can  add  two  n  x  n 
Toeplitz  matrices  in  0(n)  time. 

c.  Give  an  0(n  lg  /z)-time  algorithm  for  multiplying  an  n  x  n  Toeplitz  matrix  by  a 
vector  of  length  n.  Use  your  representation  from  part  (b). 

d.  Give  an  efficient  algorithm  for  multiplying  two  n  x  n  Toeplitz  matrices.  Analyze 
its  running  time. 


30-3  Multidimensional  fast  Fourier  transform 

We  can  generalize  the  1 -dimensional  discrete  Fourier  transform  defined  by  equa¬ 
tion  (30.8)  to  d  dimensions.  The  input  is  a  d -dimensional  array  A  =  (an  j2,...jd ) 
whose  dimensions  are  «i,  n2,  ■  ■  ■ ,  >id,  where  n1/z2---«j  =  n.  We  define  the 
d -dimensional  discrete  Fourier  transform  by  the  equation 


ykuk2,-,kd 


n i~l «2— 1  nd~  1 

E  E  -  E  ah.h . 


yi  =0  72=0  jd=0 


for  0  <  ki  <  iii,  0  <  k2  <  n2,  . . . ,  0  <  kj  < 


a.  Show  that  we  can  compute  a  d  -dimensional  DFT  by  computing  1 -dimensional 
DFTs  on  each  dimension  in  turn.  That  is,  we  first  compute  n/n\  separate 
1 -dimensional  DFTs  along  dimension  1.  Then,  using  the  result  of  the  DFTs 
along  dimension  1  as  the  input,  we  compute  n/n2  separate  1 -dimensional  DFTs 
along  dimension  2.  Using  this  result  as  the  input,  we  compute  n/n2  separate 
1 -dimensional  DFTs  along  dimension  3,  and  so  on,  through  dimension  d. 

b.  Show  that  the  ordering  of  dimensions  does  not  matter,  so  that  we  can  compute 
a  d  -dimensional  DFT  by  computing  the  1 -dimensional  DFTs  in  any  order  of 
the  d  dimensions. 
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c.  Show  that  if  we  compute  each  1 -dimensional  DFT  by  computing  the  fast  Four¬ 
ier  transform,  the  total  time  to  compute  a  d -dimensional  DFT  is  0(n  Ig  n), 
independent  of  d . 

30-4  Evaluating  all  derivatives  of  a  polynomial  at  a  point 

Given  a  polynomial  A{x)  of  degree-bound  n,  we  define  its  ith  derivative  by 

iA(x)  if  t  =  0  , 

ifl  <t<n-l, 

0  if  t  >  n  . 

From  the  coefficient  representation  (a0,a i, . . . ,  an~ i)  of  A(x)  and  a  given  point  x0, 
we  wish  to  determine  A^fxo)  for  t  =  0, 1, . . . ,  n  -*  1. 

a.  Given  coefficients  b0,  b\, . . . ,  bn_ x  such  that 

n— 1 

A(x)  =  y'  bjjx  -  x0)J  , 

7=0 

show  how  to  compute  A('t'>{xf},  for  t  =  0, 1, . . . ,  n  —  1,  in  0(n )  time. 

b.  Explain  how  to  find  b0,  b\, . . . ,  bn_1  in  0(n  lg  n)  time,  given  A(x0  +  co^)  for 
k  =  0,  \  , ,n  —  1. 

c.  Prove  that 

n— 1  /  72  —  1 

A{x 0  +  ^>kn)  =  (  Tf  5Z  - ;') 

where  f(j )  =aj-  j\  and 

//'>  _  j  ^oV(-0!  if~(«  —  1)  <  l  <  0  , 

gU  (0  if  1  <  /  <  n  -  1  . 

d.  Explain  how  to  evaluate  A (x()  +  of)  for  k  =  0,  1 —  I  in  0(n  lg n) 
time.  Conclude  that  we  can  evaluate  all  nontrivial  derivatives  of  A(x)  at  x0  in 
0(n  lg  n)  time. 
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30-5  Polynomial  evaluation  at  multiple  points 

We  have  seen  how  to  evaluate  a  polynomial  of  degree-bound  n  at  a  single  point  in 
0(n)  time  using  Horner’s  rule.  We  have  also  discovered  how  to  evaluate  such  a 
polynomial  at  all  n  complex  roots  of  unity  in  0(n  lg/j)  time  using  the  FFT.  We 
shall  now  show  how  to  evaluate  a  polynomial  of  degree-bound  n  at  n  arbitrary 
points  in  0(n  lg2  n)  time. 

To  do  so,  we  shall  assume  that  we  can  compute  the  polynomial  remainder  when 
one  such  polynomial  is  divided  by  another  in  0(n  lg  n)  time,  a  result  that  we  state 
without  proof.  For  example,  the  remainder  of  3x3  +  x2  —  3x  +  1  when  divided  by 
x2  +  x  +  2  is 

(3x3  +  x2  —  3x  +  1)  mod  (x2  +  x  +  2)  =  —lx  +  5  . 

Given  the  coefficient  representation  of  a  polynomial  A(x)  =  XX=o  akxk  and 
n  points  x0,  X\, . . . ,  x„_i,  we  wish  to  compute  the  n  values  A(x0),  A(xi), . . . , 
H(x„_i).  For  0  <  i  <  j  <  n  —  1,  define  the  polynomials  P,j  (x)  =  ["[(=,  (x  ~  xk ) 
and  Qij(x)  =  A(x)  mod  Pij(x).  Note  that  Q,,  (x)  has  degree  at  most  j  —  i. 

a.  Prove  that  A(x)  mod  (x  —  z)  =  A(z)  for  any  point  z. 

b.  Prove  that  Qkk{x)  =  A  (xk )  and  that  Qo.,,-\  (x)  =  A(x). 

c.  Prove  that  for  i  <  k  <  j ,  we  have  Q,k(x)  —  Qij(x)  mod  P,k (x)  and 
Qkj(x)  =  Qij(x)  mod  Pkj(x). 

d.  Give  an  0(n  lg2  /r)-time  algorithm  to  evaluate  ,4(x0),  A (x, ), . . . ,  4(x„_i ). 

30-6  FFT  using  modular  arithmetic 

As  defined,  the  discrete  Fourier  transform  requires  us  to  compute  with  complex 
numbers,  which  can  result  in  a  loss  of  precision  due  to  round-off  errors.  For  some 
problems,  the  answer  is  known  to  contain  only  integers,  and  by  using  a  variant  of 
the  FFT  based  on  modular  arithmetic,  we  can  guarantee  that  the  answer  is  calcu¬ 
lated  exactly.  An  example  of  such  a  problem  is  that  of  multiplying  two  polynomials 
with  integer  coefficients.  Exercise  30.2-6  gives  one  approach,  using  a  modulus  of 
length  £2(n)  bits  to  handle  a  DFT  on  n  points.  This  problem  gives  another  ap¬ 
proach,  which  uses  a  modulus  of  the  more  reasonable  length  G(lgn);  it  requires 
that  you  understand  the  material  of  Chapter  3 1 .  Let  n  be  a  power  of  2. 

a.  Suppose  that  we  search  for  the  smallest  k  such  that  p  =  kn  +  1  is  prime.  Give 
a  simple  heuristic  argument  why  we  might  expect  k  to  be  approximately  In  n . 
(The  value  of  k  might  be  much  larger  or  smaller,  but  we  can  reasonably  expect 
to  examine  0(\gn)  candidate  values  of  k  on  average.)  How  does  the  expected 
length  of  p  compare  to  the  length  of  n  ? 
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Let  g  be  a  generator  of  Z*,  and  let  w  =  gk  mod  p. 

b.  Argue  that  the  DFT  and  the  inverse  DFT  are  well-defined  inverse  operations 
modulo  p,  where  w  is  used  as  a  principal  nth  root  of  unity. 

c.  Show  how  to  make  the  FFT  and  its  inverse  work  modulo  p  in  time  0{n  lg  n), 
where  operations  on  words  of  O(lgzz)  bits  take  unit  time.  Assume  that  the 
algorithm  is  given  p  and  w. 

d.  Compute  the  DFT  modulo  p  =  17  of  the  vector  (0,  5,  3, 7,  7, 2,  1 , 6).  Note  that 
g  =  3  is  a  generator  of  Z*7. 


Chapter  notes 

Van  Loan’s  book  [343]  provides  an  outstanding  treatment  of  the  fast  Fourier  trans¬ 
form.  Press,  Teukolsky,  Vetterling,  and  Flannery  [283,  284]  have  a  good  descrip¬ 
tion  of  the  fast  Fourier  transform  and  its  applications.  For  an  excellent  introduction 
to  signal  processing,  a  popular  FFT  application  area,  see  the  texts  by  Oppenheim 
and  Schafer  [266]  and  Oppenheim  and  Willsky  [267].  The  Oppenheim  and  Schafer 
book  also  shows  how  to  handle  cases  in  which  n  is  not  an  integer  power  of  2. 

Fourier  analysis  is  not  limited  to  1 -dimensional  data.  It  is  widely  used  in  image 
processing  to  analyze  data  in  2  or  more  dimensions.  The  books  by  Gonzalez  and 
Woods  [146]  and  Pratt  [281]  discuss  multidimensional  Fourier  transforms  and  their 
use  in  image  processing,  and  books  by  Tolimieri,  An,  and  Lu  [338]  and  Van  Loan 
[343]  discuss  the  mathematics  of  multidimensional  fast  Fourier  transforms. 

Cooley  and  Tukey  [76]  are  widely  credited  with  devising  the  FFT  in  the  1960s. 
The  FFT  had  in  fact  been  discovered  many  times  previously,  but  its  importance  was 
not  fully  realized  before  the  advent  of  modern  digital  computers.  Although  Press, 
Teukolsky,  Vetterling,  and  Flannery  attribute  the  origins  of  the  method  to  Runge 
and  Konig  in  1924,  an  article  by  Heideman,  Johnson,  and  Burrus  [163]  traces  the 
history  of  the  FFT  as  far  back  as  C.  F.  Gauss  in  1805. 

Frigo  and  Johnson  [117]  developed  a  fast  and  flexible  implementation  of  the 
FFT,  called  FFTW  (“fastest  Fourier  transform  in  the  West”).  FFTW  is  designed  for 
situations  requiring  multiple  DFT  computations  on  the  same  problem  size.  Before 
actually  computing  the  DFTs,  FFTW  executes  a  “planner,”  which,  by  a  series  of 
trial  runs,  determines  how  best  to  decompose  the  FFT  computation  for  the  given 
problem  size  on  the  host  machine.  FFTW  adapts  to  use  the  hardware  cache  ef¬ 
ficiently,  and  once  subproblems  are  small  enough,  FFTW  solves  them  with  opti¬ 
mized,  straight-line  code.  Furthermore,  FFTW  has  the  unusual  advantage  of  taking 
0(»  lg  n)  time  for  any  problem  size  n,  even  when  n  is  a  large  prime. 
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Although  the  standard  Fourier  transform  assumes  that  the  input  represents  points 
that  are  uniformly  spaced  in  the  time  domain,  other  techniques  can  approximate  the 
FFT  on  “nonequispaced”  data.  The  article  by  Ware  [348]  provides  an  overview. 
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Number  theory  was  once  viewed  as  a  beautiful  but  largely  useless  subject  in  pure 
mathematics.  Today  number-theoretic  algorithms  are  used  widely,  due  in  large  part 
to  the  invention  of  cryptographic  schemes  based  on  large  prime  numbers.  These 
schemes  are  feasible  because  we  can  find  large  primes  easily,  and  they  are  secure 
because  we  do  not  know  how  to  factor  the  product  of  large  primes  (or  solve  related 
problems,  such  as  computing  discrete  logarithms)  efficiently.  This  chapter  presents 
some  of  the  number  theory  and  related  algorithms  that  underlie  such  applications. 

Section  31.1  introduces  basic  concepts  of  number  theory,  such  as  divisibility, 
modular  equivalence,  and  unique  factorization.  Section  31.2  studies  one  of  the 
world’s  oldest  algorithms:  Euclid’s  algorithm  for  computing  the  greatest  common 
divisor  of  two  integers.  Section  31.3  reviews  concepts  of  modular  arithmetic.  Sec¬ 
tion  31.4  then  studies  the  set  of  multiples  of  a  given  number  a ,  modulo  n,  and  shows 
how  to  find  all  solutions  to  the  equation  ax  =  b  (mod  n)  by  using  Euclid’s  algo¬ 
rithm.  The  Chinese  remainder  theorem  is  presented  in  Section  31.5.  Section  31.6 
considers  powers  of  a  given  number  a,  modulo  n,  and  presents  a  repeated-squaring 
algorithm  for  efficiently  computing  ab  mod  n,  given  a,  b,  and  n.  This  operation  is 
at  the  heart  of  efficient  primality  testing  and  of  much  modern  cryptography.  Sec¬ 
tion  31.7  then  describes  the  RSA  public -key  cryptosystem.  Section  31.8  examines 
a  randomized  primality  test.  We  can  use  this  test  to  find  large  primes  efficiently, 
which  we  need  to  do  in  order  to  create  keys  for  the  RSA  cryptosystem.  Finally, 
Section  31.9  reviews  a  simple  but  effective  heuristic  for  factoring  small  integers.  It 
is  a  curious  fact  that  factoring  is  one  problem  people  may  wish  to  be  intractable, 
since  the  security  of  RSA  depends  on  the  difficulty  of  factoring  large  integers. 

Size  of  inputs  and  cost  of  arithmetic  computations 

Because  we  shall  be  working  with  large  integers,  we  need  to  adjust  how  we  think 
about  the  size  of  an  input  and  about  the  cost  of  elementary  arithmetic  operations. 

In  this  chapter,  a  “large  input”  typically  means  an  input  containing  “large  in¬ 
tegers”  rather  than  an  input  containing  “many  integers”  (as  for  sorting).  Thus, 
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we  shall  measure  the  size  of  an  input  in  terms  of  the  number  of  bits  required  to 
represent  that  input,  not  just  the  number  of  integers  in  the  input.  An  algorithm 
with  integer  inputs  a\,  a2,  .  .  . ,  a.k  is  a  polynomial-time  algorithm  if  it  runs  in  time 
polynomial  in  lgaq,  lg  a2, .  .  . ,  lga^,  that  is,  polynomial  in  the  lengths  of  its  binary- 
encoded  inputs. 

In  most  of  this  book,  we  have  found  it  convenient  to  think  of  the  elemen¬ 
tary  arithmetic  operations  (multiplications,  divisions,  or  computing  remainders) 
as  primitive  operations  that  take  one  unit  of  time.  By  counting  the  number  of  such 
arithmetic  operations  that  an  algorithm  performs,  we  have  a  basis  for  making  a 
reasonable  estimate  of  the  algorithm’s  actual  running  time  on  a  computer.  Elemen¬ 
tary  operations  can  be  time-consuming,  however,  when  their  inputs  are  large.  It 
thus  becomes  convenient  to  measure  how  many  bit  operations  a  number-theoretic 
algorithm  requires.  In  this  model,  multiplying  two  /1-bit  integers  by  the  ordinary 
method  uses  Q(/32)  bit  operations.  Similarly,  we  can  divide  a  ft  -bit  integer  by  a 
shorter  integer  or  take  the  remainder  of  a  ft  -bit  integer  when  divided  by  a  shorter  in¬ 
teger  in  time  Q(/i2)  by  simple  algorithms.  (See  Exercise  31.1-12.)  Faster  methods 
are  known.  For  example,  a  simple  divide-and-conquer  method  for  multiplying  two 
/3-bit  integers  has  a  running  time  of  0(/)'"3),  and  the  fastest  known  method  has 
a  running  time  of  0(/l  lg  f>  lg  lg  /3).  For  practical  purposes,  however,  the  0 ( /) 2 ) 
algorithm  is  often  best,  and  we  shall  use  this  bound  as  a  basis  for  our  analyses. 

We  shall  generally  analyze  algorithms  in  this  chapter  in  terms  of  both  the  number 
of  arithmetic  operations  and  the  number  of  bit  operations  they  require. 


31.1  Elementary  number-theoretic  notions 

This  section  provides  a  brief  review  of  notions  from  elementary  number  theory 
concerning  the  set  Z  =  {. . . ,  —2,  —  1, 0, 1,2,...}  of  integers  and  the  set  N  = 
{0,  1, 2, . . .}  of  natural  numbers. 

Divisibility  and  divisors 

The  notion  of  one  integer  being  divisible  by  another  is  key  to  the  theory  of  numbers. 
The  notation  d  \  a  (read  “d  divides  a”)  means  that  a  =  kd  for  some  integer  k. 
Every  integer  divides  0.  If  a  >  0  and  d  \  a,  then  \d\  <  \a\.  If  d  \  a,  then  we  also 
say  that  a  is  a  multiple  of  d.  If  d  does  not  divide  a,  we  write  d  \  a. 

If  d  |  a  and  d  >  0,  we  say  that  d  is  a  divisor  of  a .  Note  that  d  \  a  if  and  only 
if  —  d  |  a,  so  that  no  generality  is  lost  by  defining  the  divisors  to  be  nonnegative, 
with  the  understanding  that  the  negative  of  any  divisor  of  a  also  divides  a.  A 
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divisor  of  a  nonzero  integer  a  is  at  least  1  but  not  greater  than  \a\.  For  example,  the 
divisors  of  24  are  1,  2,  3,  4,  6,  8,  12,  and  24. 

Every  positive  integer  a  is  divisible  by  the  trivial  divisors  1  and  a.  The  nontrivial 
divisors  of  a  are  the  factors  of  a.  For  example,  the  factors  of  20  are  2,  4,  5,  and  10. 

Prime  and  composite  numbers 

An  integer  a  >  1  whose  only  divisors  are  the  trivial  divisors  1  and  a  is  a  prime 
number  or,  more  simply,  a  prime.  Primes  have  many  special  properties  and  play  a 
critical  role  in  number  theory.  The  first  20  primes,  in  order,  are 

2,  3,  5,  7,  11,  13,  17,  19,  23,  29,  31,  37,  41,  43,  47,  53,  59,  61,  67,  71  . 

Exercise  31.1-2  asks  you  to  prove  that  there  are  infinitely  many  primes.  An  integer 
a  >  1  that  is  not  prime  is  a  composite  number  or,  more  simply,  a  composite.  For 
example,  39  is  composite  because  3  |  39.  We  call  the  integer  1  a  unit,  and  it  is 
neither  prime  nor  composite.  Similarly,  the  integer  0  and  all  negative  integers  are 
neither  prime  nor  composite. 

The  division  theorem,  remainders,  and  modular  equivalence 

Given  an  integer  n,  we  can  partition  the  integers  into  those  that  are  multiples  of  n 
and  those  that  are  not  multiples  of  n .  Much  number  theory  is  based  upon  refining 
this  partition  by  classifying  the  nonmultiples  of  n  according  to  their  remainders 
when  divided  by  n .  The  following  theorem  provides  the  basis  for  this  refinement. 
We  omit  the  proof  (but  see,  for  example,  Niven  and  Zuckerman  [265]). 

Theorem  31.1  ( Division  theorem ) 

For  any  integer  a  and  any  positive  integer  n,  there  exist  unique  integers  q  and  r 
such  that  0  <  r  <  n  and  a  =  qn  +  r .  m 

The  value  q  =  [a  /  n  J  is  the  quotient  of  the  division.  The  value  r  =  a  mod  n 
is  the  remainder  (or  residue)  of  the  division.  We  have  that  n  \  a  if  and  only  if 
a  mod  n  =  0. 

We  can  partition  the  integers  into  n  equivalence  classes  according  to  their  re¬ 
mainders  modulo  n .  The  equivalence  class  modulo  n  containing  an  integer  a  is 

[a\n  =  {a  +  kn  :  k  e  Z}  . 

For  example,  [3]7  =  {. . .  ,  —  11,— 4,  3, 10, 17, . . .};  we  can  also  denote  this  set  by 
[—4] 7  and  [10]7.  Using  the  notation  defined  on  page  54,  we  can  say  that  writing 
a  e  [b\„  is  the  same  as  writing  a  =  b  (mod  n ).  The  set  of  all  such  equivalence 
classes  is 
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Z„  =  {[«]„  :  o  <  a  <  n  -  1}  .  (31.1) 

When  you  see  the  definition 

Z„  =  {0,1 . /i  —  1}  ,  (31.2) 

you  should  read  it  as  equivalent  to  equation  (31.1)  with  the  understanding  that  0 
represents  [0]„,  1  represents  [1]„,  and  so  on;  each  class  is  represented  by  its  smallest 
nonnegative  element.  You  should  keep  the  underlying  equivalence  classes  in  mind, 
however.  For  example,  if  we  refer  to  —  1  as  a  member  of  Z„,  we  are  really  referring 
to  [n  —  1]„,  since  —1  =  n  —  1  (mod  n). 

Common  divisors  and  greatest  common  divisors 

If  d  is  a  divisor  of  a  and  d  is  also  a  divisor  of  b,  then  d  is  a  common  divisor  of  a 
and  b.  For  example,  the  divisors  of  30  are  1,  2,  3,  5,  6,  10,  15,  and  30,  and  so  the 
common  divisors  of  24  and  30  are  1,  2,  3,  and  6.  Note  that  1  is  a  common  divisor 
of  any  two  integers. 

An  important  property  of  common  divisors  is  that 
d  |  a  and  d  \  b  implies  d  \  (a  +  b)  and  d  \  (a  —  b)  .  (31.3) 

More  generally,  we  have  that 

d  |  a  and  d  \  b  implies  d  \  (ax  +  by)  (31.4) 

for  any  integers  x  and  y.  Also,  if  a  \  b,  then  either  \a\  <  \b\  or  b  =  0,  which 
implies  that 

a  |  b  and  b  \  a  implies  a  =  ±b  .  (31.5) 

The  greatest  common  divisor  of  two  integers  a  and  b,  not  both  zero,  is  the 
largest  of  the  common  divisors  of  a  and  b;  we  denote  it  by  gcd (a,b).  For  example, 
gcd(24,  30)  =  6,  gcd(5, 7)  =  1,  and  gcd(0,  9)  =  9.  If  a  and  b  are  both  nonzero, 
then  gcd (a,b)  is  an  integer  between  1  and  min(|a|  ,  | /? | ) .  We  define  gcd(0, 0)  to 
be  0;  this  definition  is  necessary  to  make  standard  properties  of  the  gcd  function 
(such  as  equation  (31.9)  below)  universally  valid. 

The  following  are  elementary  properties  of  the  gcd  function: 


gcd (a,b) 

=  gcd (b,a). 

(31.6) 

gcd (a,b) 

=  gcd  (~a,b), 

(31.7) 

gcd (a,b) 

=  gcd(|a| ,  |6|)  , 

(31.8) 

gcd(a.O) 

=  \a\  , 

(31.9) 

gcd  (a,ka) 

=  \a\  for  any  k  €  Z  . 

(31.10) 

The  following  theorem  provides  an  alternative  and  useful  characterization  of 
gcd  (a,  b). 
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Theorem  31.2 

If  a  and  h  are  any  integers,  not  both  zero,  then  gcd (a.b)  is  the  smallest  positive 
element  of  the  set  {ax  +  by  :  x,  y  €  Z}  of  linear  combinations  of  a  and  h. 

Proof  Let  s  be  the  smallest  positive  such  linear  combination  of  a  and  h,  and  let 
s  =  ax  +  by  for  some  x,y  €  Z.  Let  q  =  [a/s\.  Equation  (3.8)  then  implies 

a  mod  ^  =  a  —  qs 

=  a  —  q(ax  +  by) 

-  a  (1  -  qx)  +  b  (-qy)  , 

and  so  a  mod  s  is  a  linear  combination  of  a  and  b  as  well.  But,  since  0  < 
a  mod  s  <  s,  we  have  that  a  mod  5  =  0,  because  5  is  the  smallest  positive  such  lin¬ 
ear  combination.  Therefore,  we  have  that  5  |  a  and,  by  analogous  reasoning,  s  \  b. 
Thus,  s  is  a  common  divisor  of  a  and  b,  and  so  gcd  (a.b)  >  s.  Equation  (31.4) 
implies  that  gcd  {a.b)  \  s,  since  gcd  (a,b)  divides  both  a  and  b  and  s  is  a  linear 
combination  of  a  and  b.  But  gcd  (a.b)  \  s  and  s  >  0  imply  that  gcd  (a.b)  <  s. 
Combining  gcd (a,b)  >  s  and  gcd (a,b)  <  s  yields  gcd(a.b)  =  s.  We  conclude 
that  s  is  the  greatest  common  divisor  of  a  and  b.  m 

Corollary  31.3 

For  any  integers  a  and  b,  if  d  j  a  and  d  \  b,  then  d  \  gcd  {a.b). 

Proof  This  corollary  follows  from  equation  (31.4),  because  gcd  {a.b)  is  a  linear 
combination  of  a  and  b  by  Theorem  31.2.  ■ 

Corollary  31.4 

For  all  integers  a  and  b  and  any  nonnegative  integer  n , 
gcd  {an, bn)  =  «gcd(<2,£)  . 


Proof  If  n  =  0,  the  corollary  is  trivial.  If  n  >  0,  then  gcd  (an,  bn)  is  the  smallest 
positive  element  of  the  set  {anx  +  buy  :  x,  y  €  Z},  which  is  n  times  the  smallest 
positive  element  of  the  set  {ax  +  by  :  x ,  y  €  Z}.  ■ 

Corollary  31.5 

For  all  positive  integers  n,  a,  and  b,  if  n  \  ab  and  gcd  (a,  n)  =  1,  then  n  \  b. 


Proof  We  leave  the  proof  as  Exercise  31.1-5. 
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Relatively  prime  integers 

Two  integers  a  and  b  are  relatively  prime  if  their  only  common  divisor  is  1,  that 
is,  if  gcd (a,  b)  =  1.  For  example,  8  and  15  are  relatively  prime,  since  the  divisors 
of  8  are  1,  2,  4,  and  8,  and  the  divisors  of  15  are  1,  3,  5,  and  15.  The  following 
theorem  states  that  if  two  integers  are  each  relatively  prime  to  an  integer  p,  then 
their  product  is  relatively  prime  to  p. 

Theorem  31.6 

For  any  integers  a,  b,  and  p,  if  both  gcd {a.p)  =  1  and  gcA{b,p)  =  1,  then 
gcdf  ab.p)  =  1. 

Proof  It  follows  from  Theorem  31.2  that  there  exist  integers  x,  y,  x' ,  and  y'  such 
that 

ax  +  py  =  1  , 

bx'  +  py'  =  1  . 

Multiplying  these  equations  and  rearranging,  we  have 
ab(xx ')  +  p{ybx'  +  y'ax  +  pyy')  =  1  . 

Since  1  is  thus  a  positive  linear  combination  of  ab  and  p,  an  appeal  to  Theo¬ 
rem  31.2  completes  the  proof.  ■ 

Integers  n.\,  n2,  . . . ,  are  pairwise  relatively  prime  if,  whenever  i  /  j ,  we 
have  gcd(/2,-,  a,)  =  1. 

Unique  factorization 

An  elementary  but  important  fact  about  divisibility  by  primes  is  the  following. 
Theorem  31.7 

For  all  primes  p  and  all  integers  a  and  b,  if  p  \  ab,  then  p  \  a  or  p  \  b  (or  both). 

Proof  Assume  for  the  purpose  of  contradiction  that  p  \  ab,  but  that  p  \  a  and 
p  \  b.  Thus,  gcd(a,  p)  =  1  and  gcd(£,  p)  =  1,  since  the  only  divisors  of  p  are  1 
and  p,  and  we  assume  that  p  divides  neither  a  nor  b.  Theorem  31.6  then  implies 
that  gcd  {ab.p)  =  1,  contradicting  our  assumption  that  p  \  ab,  since  p  \  ab 
implies  gcA{ab,  p)  =  p.  This  contradiction  completes  the  proof.  ■ 

A  consequence  of  Theorem  31.7  is  that  we  can  uniquely  factor  any  composite 
integer  into  a  product  of  primes. 
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Theorem  31.8  (Unique  factorization) 

There  is  exactly  one  way  to  write  any  composite  integer  a  as  a  product  of  the  form 

e\  e 9  er 

a  =  Pi  p2  ■■■Pr  , 

where  the  p,  are  prime,  px  <  p2  <  ■  ■  ■  <  pr,  and  the  <?,  are  positive  integers. 
Proof  We  leave  the  proof  as  Exercise  31.1-11.  ■ 

As  an  example,  the  number  6000  is  uniquely  factored  into  primes  as  24  ■  3  ■  53. 

Exercises 

31.1- 1 

Prove  that  if  a  >  b  >  0  and  c  =  a  +  b,  then  c  mod  a  =  b. 

31.1- 2 

Prove  that  there  are  infinitely  many  primes.  (Hint:  Show  that  none  of  the  primes 
Pi,P2,---,Pk  divide  (px  p2---  pk)  +  1  ■) 

31.1- 3 

Prove  that  if  a  \  h  and  b  \  c,  then  a  \  c. 

31.1- 4 

Prove  that  if  p  is  prime  and  0  <  k  <  p,  then  gcd (k.  p)  =  1. 

31.1- 5 

Prove  Corollary  31.5. 

31.1- 6 

Prove  that  if  p  is  prime  and  0  <  k  <  p,  then  p  \  (f).  Conclude  that  for  all  integers 
a  and  b  and  all  primes  p, 

(a  +  b)p  =  ap  +  bp  (mod  p)  . 


31.1-7 

Prove  that  if  a  and  b  are  any  positive  integers  such  that  a  \  b,  then 

(x  mod  b )  mod  a  =  x  mod  a 

for  any  x.  Prove,  under  the  same  assumptions,  that 

x  =  y  (mod  b)  implies  x  =  y  (mod  a) 

for  any  integers  x  and  y. 
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31.1-8 

For  any  integer  k  >  0,  an  integer  n  is  a  kth  power  if  there  exists  an  integer  a  such 
that  ak  =  n.  Furthermore,  n  >  1  is  a  nontrivial  power  if  it  is  a  kth  power  for 
some  integer  k  >  1.  Show  how  to  determine  whether  a  given  /3-bit  integer  n  is  a 
nontrivial  power  in  time  polynomial  in  ft. 


31.1- 9 

Prove  equations  (3 1 .6)— (3 1.10). 

31.1- 10 

Show  that  the  gcd  operator  is  associative.  That  is,  prove  that  for  all  integers  a,  b, 
and  c, 

gcd(a,  gcd(£,  c))  =  gcd(gcd (a.b),c)  . 

31.1- 11  * 

Prove  Theorem  31.8. 

31.1- 12 

Give  efficient  algorithms  for  the  operations  of  dividing  a  /3-bit  integer  by  a  shorter 
integer  and  of  taking  the  remainder  of  a  /3-bit  integer  when  divided  by  a  shorter 
integer.  Your  algorithms  should  run  in  time  0(/32). 

31.1- 13 

Give  an  efficient  algorithm  to  convert  a  given  /3-bit  (binary)  integer  to  a  decimal 
representation.  Argue  that  if  multiplication  or  division  of  integers  whose  length 
is  at  most  /3  takes  time  M(ft),  then  we  can  convert  binary  to  decimal  in  time 
0(M(/3)  lg  yd).  (Hint:  Use  a  divide-and-conquer  approach,  obtaining  the  top  and 
bottom  halves  of  the  result  with  separate  recursions.) 


31.2  Greatest  common  divisor 

In  this  section,  we  describe  Euclid’s  algorithm  for  efficiently  computing  the  great¬ 
est  common  divisor  of  two  integers.  When  we  analyze  the  running  time,  we  shall 
see  a  surprising  connection  with  the  Fibonacci  numbers,  which  yield  a  worst-case 
input  for  Euclid’s  algorithm. 

We  restrict  ourselves  in  this  section  to  nonnegative  integers.  This  restriction  is 
justified  by  equation  (31.8),  which  states  that  gcd (a.b)  =  gcd(|a|  ,\b\). 
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In  principle,  we  can  compute  gcd (a.b)  for  positive  integers  a  and  b  from  the 
prime  factorizations  of  a  and  b.  Indeed,  if 

a  =  P?  P?  ■  ■  ■  Pi'  >  (31.11) 

b  =  p{1  p{2  ■■■p{r '  ,  (31.12) 

with  zero  exponents  being  used  to  make  the  set  of  primes  pi,  p2,  •  •  • ,  Pr  the  same 
for  both  a  and  b,  then,  as  Exercise  31.2-1  asks  you  to  show, 

gcd  (a,  b)  =  pT^pT^fP  . . .  p^er,fr)  (3U3) 

As  we  shall  show  in  Section  31.9,  however,  the  best  algorithms  to  date  for  factoring 
do  not  run  in  polynomial  time.  Thus,  this  approach  to  computing  greatest  common 
divisors  seems  unlikely  to  yield  an  efficient  algorithm. 

Euclid’s  algorithm  for  computing  greatest  common  divisors  relies  on  the  follow¬ 
ing  theorem. 

Theorem  31.9  ( GCD  recursion  theorem) 

For  any  nonnegative  integer  a  and  any  positive  integer  b, 

gcd  (a,  b)  =  gcd  (b.a  mod  b)  . 

Proof  We  shall  show  that  gcd (a.b)  and  gcd (b.a  mod  b)  divide  each  other,  so 
that  by  equation  (31.5)  they  must  be  equal  (since  they  are  both  nonnegative). 

We  first  show  that  gcd  (a.b)  |  gcd  (b.a  mod  b).  If  we  let  d  =  gcd(a.b),  then 
d  |  a  and  d  (  b.  By  equation  (3.8),  a  mod  b  =  a  —  qb,  where  q  =  [a/b\. 
Since  a  mod  b  is  thus  a  linear  combination  of  a  and  b,  equation  (31.4)  implies  that 
d  |  (a  mod  h).  Therefore,  since  d  \  b  and  d  \  (a  mod  b),  Corollary  31.3  implies 
that  d  |  gcd  (b.a  mod  b)  or,  equivalently,  that 

gcd(a,  b)  |  gcd(fe,  a  mod  b).  (31.14) 

Showing  that  gcd  (b.a  mod  b)  \  gcA{a.b)  is  almost  the  same.  If  we  now  let 
d  =  gcd(&,  a  mod  b ),  then  d  \  b  and  d  \  (a  mod  b).  Since  a  =  qb  +  (a  mod  b ), 
where  q  =  [a/b\,  we  have  that  a  is  a  linear  combination  of  b  and  (a  mod  b).  By 
equation  (31.4),  we  conclude  that  d  \  a.  Since  d  \  b  and  d  \  a,  we  have  that 
d  |  gcd  (a,  h)  by  Corollary  31.3  or,  equivalently,  that 

gcd  [b.a  mod  b)  \  gcd  (a,b).  (31.15) 

Using  equation  (31.5)  to  combine  equations  (31.14)  and  (31.15)  completes  the 
proof.  ■ 
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Euclid’s  algorithm 

The  Elements  of  Euclid  (circa  300  B.C.)  describes  the  following  gcd  algorithm, 
although  it  may  be  of  even  earlier  origin.  We  express  Euclid’s  algorithm  as  a 
recursive  program  based  directly  on  Theorem  31.9.  The  inputs  a  and  b  are  arbitrary 
nonnegative  integers. 

Euclid  (a.b) 

1  if/;  ==  0 

2  return  a 

3  else  return  Euclid (b.  a  mod  b) 


As  an  example  of  the  running  of  Euclid,  consider  the  computation  of  gcd(30, 21): 


Euclid(30,  21) 


Euclid  (21,  9) 
Euclid  (9, 3) 
Euclid  (3,0) 

3  . 


This  computation  calls  Euclid  recursively  three  times. 

The  correctness  of  Euclid  follows  from  Theorem  31.9  and  the  property  that  if 
the  algorithm  returns  a  in  line  2,  then  b  =  0,  so  that  equation  (31.9)  implies  that 
gcd  (a.b)  =  gcd  (a,  0)  =  a.  The  algorithm  cannot  recurse  indefinitely,  since  the 
second  argument  strictly  decreases  in  each  recursive  call  and  is  always  nonnegative. 
Therefore,  Euclid  always  terminates  with  the  correct  answer. 


The  running  time  of  Euclid’s  algorithm 

We  analyze  the  worst-case  running  time  of  Euclid  as  a  function  of  the  size  of 
a  and  b.  We  assume  with  no  loss  of  generality  that  a  >  b  >  0.  To  justify  this 
assumption,  observe  that  if  b  >  a  >  0,  then  Euclid(<2,  b)  immediately  makes  the 
recursive  call  Euclid(Z;,  a).  That  is,  if  the  first  argument  is  less  than  the  second 
argument,  Euclid  spends  one  recursive  call  swapping  its  arguments  and  then  pro¬ 
ceeds.  Similarly,  if  b  =  a  >  0,  the  procedure  terminates  after  one  recursive  call, 
since  a  mod  b  =  0. 

The  overall  running  time  of  Euclid  is  proportional  to  the  number  of  recursive 
calls  it  makes.  Our  analysis  makes  use  of  the  Fibonacci  numbers  Fk,  defined  by 
the  recurrence  (3.22). 

Lemma  31.10 

If  a  >  b  >  1  and  the  call  Euclid  (a.b)  performs  k  >  1  recursive  calls,  then 
a  >  Fk+ 2  and  b>  Fk+1. 
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Proof  The  proof  proceeds  by  induction  on  k.  For  the  basis  of  the  induction,  let 
k  =  1.  Then,  b  >  1  =  F2 ,  and  since  a  >  b,  we  must  have  a  >  2  =  F^.  Since 
b  >  (a  mod  b),  in  each  recursive  call  the  first  argument  is  strictly  larger  than  the 
second;  the  assumption  that  a  >  b  therefore  holds  for  each  recursive  call. 

Assume  inductively  that  the  lemma  holds  if  k  —  1  recursive  calls  are  made;  we 
shall  then  prove  that  the  lemma  holds  for  k  recursive  calls.  Since  k  >  0,  we  have 
b  >  0,  and  EuCLlD(a,&)  calls  EuCLlD(&,a  mod  b)  recursively,  which  in  turn 
makes  k  —  1  recursive  calls.  The  inductive  hypothesis  then  implies  that  b  >  Fk+i 
(thus  proving  part  of  the  lemma),  and  a  mod  b  >  Fg.  We  have 

b  +  (a  mod  b)  =  b  +  (a  —  b  [a /b\ ) 

<  a  , 

since  a  >  b  >  0  implies  \_a/b\  >  1.  Thus, 
a  >  b  +  (a  mod  b) 

>  Fg+\  +  Fk 

-  Fk+ 2  .  m 

The  following  theorem  is  an  immediate  corollary  of  this  lemma. 

Theorem  31.11  (Lame’s  theorem ) 

For  any  integer  k  >  1,  if  a  >  b  >  1  and  b  <  Fk+\,  then  the  call  EuCLlD(<a,h) 
makes  fewer  than  k  recursive  calls.  ■ 

We  can  show  that  the  upper  bound  of  Theorem  31.11  is  the  best  possible  by 
showing  that  the  call  EuCLiDCFt+i,  Fk)  makes  exactly  k  —  1  recursive  calls 
when  k  >  2.  We  use  induction  on  k.  For  the  base  case,  k  =  2,  and  the  call 
Euclid^,  F2)  makes  exactly  one  recursive  call,  to  Euclid(1,0).  (We  have  to 
start  at  k  —  2,  because  when  k  =  1  we  do  not  have  F2  >  F\ .)  For  the  induc¬ 
tive  step,  assume  that  Euclid  (Fk.  t )  makes  exactly  k  —  2  recursive  calls.  For 
k  >  2,  we  have  Fk  >  Fg_i  >  0  and  Fk+i  =  Fk  +  Fk- 1,  and  so  by  Exercise  31.1-1, 
we  have  Fk+ 1  mod  Fk  =  Fk- 1-  Thus,  we  have 

gcd (Fk+l,Fk)  =  gcd(Fk,  Fk+i  mod  Fk) 

=  gcd  (Fg,Fk-i). 

Therefore,  the  call  Euclid (Ejt+i,  Fk)  recurses  one  time  more  than  the  call 
Euclid(F^,  Fk-\ ),  or  exactly  k  —  1  times,  meeting  the  upper  bound  of  Theo¬ 
rem  31.11. 

Since  Fk  is  approximately  (f>k / Vs,  where  <j>  is  the  golden  ratio  (1  +  \f5)/2  de¬ 
fined  by  equation  (3.24),  the  number  of  recursive  calls  in  Euclid  is  0( lg  b).  (See 
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Figure  31.1  How  Extended  Euclid  computes  gcd(99,  78).  Each  line  shows  one  level  of  the 
recursion:  the  values  of  the  inputs  a  and  b,  the  computed  value  [a/b\,  and  the  values  d,  x,  and  y 
returned.  The  triple  ( d ,  x,  y)  returned  becomes  the  triple  {d' ,  x' ,  y')  used  at  the  next  higher  level 
of  recursion.  The  call  EXTENDED  EUCLID(99,  78)  returns  (3,  —1 1.  14),  so  that  gcd(99,  78)  =  3  = 
99 -(-11) +  78 -14. 

Exercise  31.2-5  for  a  tighter  bound.)  Therefore,  if  we  call  Euclid  on  two  /3-bit 
numbers,  then  it  performs  0(/3)  arithmetic  operations  and  O  ( /3 3 )  bit  operations 
(assuming  that  multiplication  and  division  of  /3-bit  numbers  take  (9  ( /3  2 )  bit  oper¬ 
ations).  Problem  31-2  asks  you  to  show  an  0(/32)  bound  on  the  number  of  bit 
operations. 

The  extended  form  of  Euclid’s  algorithm 

We  now  rewrite  Euclid’s  algorithm  to  compute  additional  useful  information. 
Specifically,  we  extend  the  algorithm  to  compute  the  integer  coefficients  x  and  y 
such  that 

d  =  gcd (a,b)  =  ax  +  by  .  (31.16) 

Note  that  x  and  y  may  be  zero  or  negative.  We  shall  find  these  coefficients  useful 
later  for  computing  modular  multiplicative  inverses.  The  procedure  Extended- 
Euclid  takes  as  input  a  pair  of  nonnegative  integers  and  returns  a  triple  of  the 
form  ( d ,  x,  y)  that  satisfies  equation  (31.16). 

Extended-Euclid  (a,  b) 

1  if  ==  0 

2  return  (a ,  1 , 0) 

3  else  (d\ x' ,  yr)  =  Extended-Euclid  (£>,  a  mod  b) 

4  (, d,x,y )  =  {d',y',x'  —  [a/b\  y') 

5  return  (d,  x,  y) 

Figure  31.1  illustrates  how  Extended-Euclid  computes  gcd(99, 78). 

The  Extended-Euclid  procedure  is  a  variation  of  the  Euclid  procedure. 
Line  1  is  equivalent  to  the  test  “b  ==  0”  in  line  1  of  Euclid.  If  b  =  0,  then 
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Extended-Euclid  returns  not  only  d  =  a  in  line  2,  but  also  the  coefficients 
x  =  1  and  y  =  0,  so  that  a  —  ax  +  by.  If  b  ^  0,  Extended-Euclid  first 
computes  (d',  x' ,  y')  such  that  d'  =  gcd (b,  a  mod  b)  and 

d’  =  bx'  +  ( a  mod  b)y ’  .  (31.17) 

As  for  Euclid,  we  have  in  this  case  d  =  gcd  (a,  b)  =  d'  =  gcd  (b.a  mod  b). 
To  obtain  x  and  y  such  that  d  =  ax  +  by,  we  start  by  rewriting  equation  (31.17) 
using  the  equation  d  =  d'  and  equation  (3.8): 

d  =  bx’  +  (a  -  b[a/b\)y’ 

=  ay'  +  b(x'  —  [a/b\  y')  . 

Thus,  choosing  x  =  y'  and  y  =  x'  —\a/b\y'  satisfies  the  equation  d  =  ax  +  by, 
proving  the  correctness  of  Extended-Euclid. 

Since  the  number  of  recursive  calls  made  in  Euclid  is  equal  to  the  number 
of  recursive  calls  made  in  Extended-Euclid,  the  running  times  of  Euclid 
and  Extended-Euclid  are  the  same,  to  within  a  constant  factor.  That  is,  for 
a  >  b  >  0,  the  number  of  recursive  calls  is  0(lg  b). 

Exercises 


31.2- 1 

Prove  that  equations  (31.11)  and  (31.12)  imply  equation  (31.13). 

31.2- 2 

Compute  the  values  (d,x,  y)  that  the  call  Extended-Euclid  (899, 493)  returns. 

31.2- 3 

Prove  that  for  all  integers  a,  k,  and  n, 
gcd(a,n)  =  gcd(fl  +  kn,n)  . 

31.2- 4 

Rewrite  Euclid  in  an  iterative  form  that  uses  only  a  constant  amount  of  memory 
(that  is,  stores  only  a  constant  number  of  integer  values). 


31.2- 5 

If  a  >  b  >  0,  show  that  the  call  Euclid  (a,  b )  makes  at  most  1  +  log^  b  recursive 
calls.  Improve  this  bound  to  1  +  log  ^(b  /  gcd  (a.  b)). 

31.2- 6 

What  does  Extended-Euclid (Fg+i,  Fg)  return?  Prove  your  answer  correct. 
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31.2-7 

Define  the  gcd  function  for  more  than  two  arguments  by  the  recursive  equation 
gcd(a0.ai, . . .  ,an)  =  gcd(a0,gcd(a1.a2, . . .  ,an)).  Show  that  the  gcd  function 
returns  the  same  answer  independent  of  the  order  in  which  its  arguments  are  speci¬ 
fied.  Also  show  how  to  find  integers  x0,  X\ , . . . ,  xn  such  that  gcd(a0-  Oi,  ■  ■  ■ ,  an)  — 
cioXo  +  a,  a'i  +  ■  ■  ■  +  a„x„.  Show  that  the  number  of  divisions  performed  by  your 
algorithm  is  0(n  +  lg(max{a0,<2i,  •  •  • ,««})). 


31.2-8 

Define  \cm(a\,a2, . . .  ,an)  to  be  the  least  common  multiple  of  the  n  integers 

a\,  a2 . an,  that  is,  the  smallest  nonnegative  integer  that  is  a  multiple  of  each  a,. 

Show  how  to  compute  lcm(flln  a2, . . . ,  an)  efficiently  using  the  (two-argument)  gcd 
operation  as  a  subroutine. 


31.2-9 

Prove  that  ,  n2,  n3,  and  n4  are  pairwise  relatively  prime  if  and  only  if 
gcd(/!i/?2,  773774)  =  gcd(/Ji773, /i2«4)  =  1  ■ 

More  generally,  show  that  n\,n2. . . .  ,nk  are  pairwise  relatively  prime  if  and  only 
if  a  set  of  [lg  k ]  pairs  of  numbers  derived  from  the  n,  are  relatively  prime. 


31.3  Modular  arithmetic 

Informally,  we  can  think  of  modular  arithmetic  as  arithmetic  as  usual  over  the 
integers,  except  that  if  we  are  working  modulo  n,  then  every  result  x  is  replaced 
by  the  element  of  {0, 1, . . . ,  77  —  1}  that  is  equivalent  to  x,  modulo  n  (that  is,  x  is 
replaced  by  x  mod  n).  This  informal  model  suffices  if  we  stick  to  the  operations 
of  addition,  subtraction,  and  multiplication.  A  more  formal  model  for  modular 
arithmetic,  which  we  now  give,  is  best  described  within  the  framework  of  group 
theory. 

Finite  groups 

A  group  (S,  ©)  is  a  set  S  together  with  a  binary  operation  ©  defined  on  S  for 
which  the  following  properties  hold: 

1.  Closure:  For  all  a,  b  €  S,  we  have  a  ©  b  e  S. 

2.  Identity:  There  exists  an  element  e  €  S,  called  the  identity  of  the  group,  such 
that  e  ©  a  =  a  ©  e  =  a  for  all  a  e  S. 

3.  Associativity:  For  all  a,b,  c  e  S,  we  have  (a  ©  b)  ©  c  =  a  ©  (b  ©  c). 
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4.  Inverses:  For  each  a  €  S,  there  exists  a  unique  element  b  e  S,  called  the 
inverse  of  a,  such  that  a®b  =  b®a  =  e. 

As  an  example,  consider  the  familiar  group  (Z,  +)  of  the  integers  Z  under  the 
operation  of  addition:  0  is  the  identity,  and  the  inverse  of  a  is  —a.  If  a  group  (S,  ©) 
satisfies  the  commutative  law  a  ©  b  =  b  ©  a  for  all  a.  b  e  S,  then  it  is  an  abelian 
group.  If  a  group  (.S’,  ©)  satisfies  |S|  <  oo,  then  it  is  a  finite  group. 

The  groups  defined  by  modular  addition  and  multiplication 

We  can  form  two  finite  abelian  groups  by  using  addition  and  multiplication  mod¬ 
ulo  n,  where  n  is  a  positive  integer.  These  groups  are  based  on  the  equivalence 
classes  of  the  integers  modulo  n,  defined  in  Section  31.1. 

To  define  a  group  on  Z„,  we  need  to  have  suitable  binary  operations,  which 
we  obtain  by  redefining  the  ordinary  operations  of  addition  and  multiplication. 
We  can  easily  define  addition  and  multiplication  operations  for  Z„,  because  the 
equivalence  class  of  two  integers  uniquely  determines  the  equivalence  class  of  their 
sum  or  product.  That  is,  if  a  =  a'  (mod  n)  and  b  =  b'  (mod  n ),  then 

a  +  b  =  a'  +  b'  (mod  n)  , 
ab  =  a'b'  (mod  n)  . 

Thus,  we  define  addition  and  multiplication  modulo  n,  denoted  +„  and  by 
[a\n+n[b\n  =  [a  +  b]n,  (31.18) 

\o]n  'n  \fi\n  —  \pb\n  - 

(We  can  define  subtraction  similarly  on  Z„  by  [a]n  [h\„  =  [a  —  b\n,  but  divi¬ 
sion  is  more  complicated,  as  we  shall  see.)  These  facts  justify  the  common  and 
convenient  practice  of  using  the  smallest  nonnegative  element  of  each  equivalence 
class  as  its  representative  when  performing  computations  in  Z„.  We  add,  subtract, 
and  multiply  as  usual  on  the  representatives,  but  we  replace  each  result  x  by  the 
representative  of  its  class,  that  is,  by  x  mod  n. 

Using  this  definition  of  addition  modulo  n,  we  define  the  additive  group 
modulo  it  as  (Z„,+„).  The  size  of  the  additive  group  modulo  n  is  |Z„|  =  n. 
Figure  31.2(a)  gives  the  operation  table  for  the  group  (Z6,  +6)- 

Theorem  31.12 

The  system  (Z„,  +„)  is  a  finite  abelian  group. 

Proof  Equation  (31.18)  shows  that  (Z„,  +„)  is  closed.  Associativity  and  com¬ 
mutativity  of  +„  follow  from  the  associativity  and  commutativity  of  +: 
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Figure  31.2  Two  finite  groups.  Equivalence  classes  are  denoted  by  their  representative  elements, 
(a)  The  group  (Z6,  +6)-  (b)  The  group  (Z*5,  -is). 


([d]n  T n  [^]/z)  T n  [t]h  — 


\p  b\n  T n  [c]rt 
[(a  +  b)  +  c]n 
[a  +  (b  +  c)]n 
\n\n  n  \b 

\p\n  “t ~n  ([^]zz  “t ~n  [t]h)  * 


“l- n  \b\n 


[a  +  b]n 
[b  +  a]n 

\b]n  “l- n  \fl\n 


The  identity  element  of  (Z„,  +„)  is  0  (that  is,  [0],,).  The  (additive)  inverse  of 
an  element  a  (that  is,  of  [«]„)  is  the  element  —a  (that  is,  [— a]n  or  [»  —  a\„),  since 
\(l\n  T n  [  n\n  —  \fl  —  [^]n*  ® 


Using  the  definition  of  multiplication  modulo  n,  we  define  the  multiplicative 
group  modulo  n  as  (Z*,  •„).  The  elements  of  this  group  are  the  set  Z*  of  elements 
in  Z„  that  are  relatively  prime  to  n ,  so  that  each  one  has  a  unique  inverse,  modulo  n : 

Z*  =  {[a]n  e  Z„  :  gcd(a,«)  =  1}  . 

To  see  that  Z*  is  well  defined,  note  that  for  0  <  a  <  n,  we  have  a  =  (a  +  kri) 
(mod  n)  for  all  integers  k.  By  Exercise  31.2-3,  therefore,  gcd (a.n)  =  1  implies 
gcd(a  +  k n .  n )  =  1  for  all  integers  k.  Since  [a]n  =  {a  +  kn  :  k  e  Z},  the  set  Z* 
is  well  defined.  An  example  of  such  a  group  is 

Z*s  =  {1,2,4,7,8,11,13,14}  , 
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where  the  group  operation  is  multiplication  modulo  15.  (Here  we  denote  an  el¬ 
ement  [a] is  as  a;  for  example,  we  denote  [7]15  as  7.)  Figure  31.2(b)  shows  the 
group  (Z*5,-1S).  For  example,  8-11  =  13  (mod  15),  working  in  Z*5.  The  iden¬ 
tity  for  this  group  is  1 . 

Theorem  31.13 

The  system  (Z*,  •„)  is  a  finite  abelian  group. 

Proof  Theorem  31.6  implies  that  (Z*,-„)  is  closed.  Associativity  and  commu¬ 
tativity  can  be  proved  for  •„  as  they  were  for  +„  in  the  proof  of  Theorem  31.12. 
The  identity  element  is  [1]„.  To  show  the  existence  of  inverses,  let  a  be  an  element 
of  Z*  and  let  (d,x,y)  be  returned  by  Extended-Euclid(<3,  n).  Then,  d  =  1, 
since  a  €  Z*,  and 

ax  +  ny  =  1  (31.19) 

or,  equivalently, 
ax  =  1  (mod  n)  . 

Thus,  [x]„  is  a  multiplicative  inverse  of  [//]„,  modulo  n.  Furthermore,  we  claim 
that  [x]n  e  Z*.  To  see  why,  equation  (31.19)  demonstrates  that  the  smallest  pos¬ 
itive  linear  combination  of  x  and  n  must  be  1.  Therefore,  Theorem  31.2  implies 
that  gcd(x,«)  =  1.  We  defer  the  proof  that  inverses  are  uniquely  defined  until 
Corollary  3 1 .26.  ■ 

As  an  example  of  computing  multiplicative  inverses,  suppose  that  a  =  5  and 
n  =  11.  Then  EXTENDED-EuCLlD(a,//)  returns  (d,x,y)  =  (1,— 2,1),  so  that 
1  =  5-  (—2)  +11-1.  Thus,  [— 2]n  (i.e.,  [9]n)  is  the  multiplicative  inverse  of  [5]n. 

When  working  with  the  groups  (Z„,  +„)  and  (Z*,  •„)  in  the  remainder  of  this 
chapter,  we  follow  the  convenient  practice  of  denoting  equivalence  classes  by  their 
representative  elements  and  denoting  the  operations  +„  and  ■„  by  the  usual  arith¬ 
metic  notations  +  and  ■  (or  juxtaposition,  so  that  ab  =  a  ■  h)  respectively.  Also, 
equivalences  modulo  n  may  also  be  interpreted  as  equations  in  Z„.  For  example, 
the  following  two  statements  are  equivalent: 

ax  =  b  (mod  n)  , 

[d]n  'n  |a]h  —  • 

As  a  further  convenience,  we  sometimes  refer  to  a  group  (5,  ©)  merely  as  S 
when  the  operation  ©  is  understood  from  context.  We  may  thus  refer  to  the  groups 
(Z„,  +„)  and  (Z*,  •„)  as  Z„  and  Z*,  respectively. 

We  denote  the  (multiplicative)  inverse  of  an  element  a  by  (a-1  mod  n).  Division 
in  Z*  is  defined  by  the  equation  a/b  =  ab~l  (mod  n).  For  example,  in  Z*5 
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we  have  that  7  1  =  13  (mod  15),  since  7  ■  13  =  91  =  1  (mod  15),  so  that 
4/7  =  4-13  =  7  (mod  15). 

The  size  of  Z*  is  denoted  (pin).  This  function,  known  as  Euler’s  phi  function , 
satisfies  the  equation 


fin)  =  n 

p  :  p  is  prime  and  p  \  n 


n  K)' 


(31.20) 


so  that  p  runs  over  all  the  primes  dividing  n  (including  n  itself,  if  n  is  prime). 
We  shall  not  prove  this  formula  here.  Intuitively,  we  begin  with  a  list  of  the  n 
remainders  {0, 1, —  1}  and  then,  for  each  prime  p  that  divides  n,  cross  out 
every  multiple  of  p  in  the  list.  For  example,  since  the  prime  divisors  of  45  are  3 
and  5, 


0(45) 


45 

45 


24  . 


If  p  is  prime,  then  Z*  =  {1, 2,  —  1},  and 

(p{p)  = 

=  p-  1.  (31.21) 


If  n  is  composite,  then  (pin)  <  n  —  1,  although  it  can  be  shown  that 

4>(n)  >  r|  ,  ~~~  3  (31-22) 

eY^lnn  + 

for  n  >  3,  where  y  =  0.5772156649 ...  is  Euler’s  constant.  A  somewhat  simpler 
(but  looser)  lower  bound  for  73  >  5  is 

fin)  >  .  (31.23) 

6  In  In  73 

The  lower  bound  (31.22)  is  essentially  the  best  possible,  since 

0(») 


lim  inf  - 
n-+oo  77/ In  In  77 


(31.24) 


Subgroups 

If  (5,  ©)  is  a  group,  S'  C  S .  and  (S',  ©)  is  also  a  group,  then  (5',  ©)  is  a  subgroup 
of  (S,  ©).  For  example,  the  even  integers  form  a  subgroup  of  the  integers  under  the 
operation  of  addition.  The  following  theorem  provides  a  useful  tool  for  recognizing 
subgroups. 
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Theorem  31.14  (A  nonempty  closed  subset  of  a  finite  group  is  a  subgroup ) 

If  ( S ,  ©)  is  a  finite  group  and  S'  is  any  nonempty  subset  of  S  such  that  a  ®  b  €  S' 
for  all  a,  b  e  S',  then  (S',  ©)  is  a  subgroup  of  (5,  ©). 

Proof  We  leave  the  proof  as  Exercise  31.3-3.  ■ 

For  example,  the  set  {0, 2,4,6}  forms  a  subgroup  of  Z8,  since  it  is  nonempty 
and  closed  under  the  operation  +  (that  is,  it  is  closed  under  +8). 

The  following  theorem  provides  an  extremely  useful  constraint  on  the  size  of  a 
subgroup;  we  omit  the  proof. 

Theorem  31.15  (Lagrange’s  theorem) 

If  (S,  ©)  is  a  finite  group  and  (5',  ©)  is  a  subgroup  of  (S,  ©),  then  |S'|  is  a  divisor 
of|5|.  ■ 

A  subgroup  S'  of  a  group  S  is  a  proper  subgroup  if  S'  f  S.  We  shall  use  the 
following  corollary  in  our  analysis  in  Section  31.8  of  the  Miller-Rabin  primality 
test  procedure. 

Corollary  31.16 

If  S'  is  a  proper  subgroup  of  a  finite  group  S,  then  |S'|  <  \S\/2.  m 


Subgroups  generated  by  an  element 

Theorem  31.14  gives  us  an  easy  way  to  produce  a  subgroup  of  a  finite  group  (S,  ©): 
choose  an  element  a  and  take  all  elements  that  can  be  generated  from  a  using  the 
group  operation.  Specifically,  define  a^  for  k  >  1  by 
k 

aik)  =  =  a®a®---®a  . 

k 

For  example,  if  we  take  a  =  2  in  the  group  Z6,  the  sequence  a^1’,  o(3), ...  is 

2,  4, 0,2, 4,  0,2,  4,0,...  . 

In  the  group  Z„,  we  have  cfk)  =  ka  mod  n,  and  in  the  group  Z*,  we  have  cfk)  — 
ak  mod  n.  We  define  the  subgroup  generated  by  a,  denoted  (a)  or  ((a).  ©),  by 

(a)  =  {a(k)  :  k  >  1}  . 

We  say  that  a  generates  the  subgroup  (a)  or  that  a  is  a  generator  of  (a).  Since  S  is 
finite,  (a)  is  a  finite  subset  of  S,  possibly  including  all  of  S.  Since  the  associativity 
of  ©  implies 
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a(i)  ©  aU)  =  a(i+J)  , 

(a)  is  closed  and  therefore,  by  Theorem  31.14,  (a)  is  a  subgroup  of  S.  For  example, 
in  Z6,  we  have 

(0)  =  {0}  , 

(1)  =  {0,1,2,3,4,5}  , 

(2)  =  {0,2,4}  . 

Similarly,  in  Z*,  we  have 

(1)  =  {1}  , 

(2)  =  {1,2,4}  , 

(3}  =  {1,2,  3,  4,  5,  6}  . 

The  order  of  a  (in  the  group  S ),  denoted  ord(a),  is  defined  as  the  smallest  posi¬ 
tive  integer  t  such  that  a(r)  =  e. 

Theorem  31.17 

For  any  finite  group  (5,  ©)  and  any  a  €  S,  the  order  of  a  is  equal  to  the  size  of  the 
subgroup  it  generates,  orord(a)  =  |(a)|. 

Proof  Let  t  =  ord(a).  Since  air)  =  e  and  a(t+k)  =  a(t}  ©  a(k)  =  a(k)  for 
k  >  1,  if  i  >  t,  then  a(j)  =  a(')  for  some  j  <  i.  Thus,  as  we  generate  ele¬ 
ments  by  a,  we  see  no  new  elements  after  a^\  Thus,  (a)  =  {<a(1\a(2\  . . . ,  a^}, 
and  so  |(a)|  <  t.  To  show  that  |(a)|  >  t,  we  show  that  each  element  of  the  se¬ 
quence  rz(1),  a(2), . . . ,  is  distinct.  Suppose  for  the  puipose  of  contradiction  that 
=  a<J)  for  some  i  and  j  satisfying  I  <  i  <  j  <  t.  Then,  a(j+k)  =  cf  ’+k) 
for  k  >  0.  But  this  equality  implies  that  a(l+(J~J^  =  a(j+(t~j'>'1  =  e,  a  contradic¬ 
tion,  since  i  +  (t  —  j)  <  t  but  t  is  the  least  positive  value  such  that  a{t)  =  e.  There¬ 
fore,  each  element  of  the  sequence  a(1),  <3(2), . . . ,  is  distinct,  and  |  (a)  \  >  t.  We 
conclude  that  ord(a)  =  |(a)|.  ■ 

Corollary  31.18 

The  sequence  c/(l),  r/(2\  ...  is  periodic  with  period  t  =  ord(a);  that  is,  a<J)  = 
if  and  only  if  i  =  j  (mod  t).  m 

Consistent  with  the  above  corollary,  we  define  a(())  as  e  and  as  cf  "Uld 
where  t  =  ord(rz),  for  all  integers  i . 

Corollary  31.19 

If  ( S ,  ©)  is  a  finite  group  with  identity  e,  then  for  all  a  e  S, 
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Proof  Lagrange’s  theorem  (Theorem  31.15)  implies  that  ord(a)  |  |S|,  and  so 
| S |  =  0  (mod  t ),  where  t  =  ord(a).  Therefore,  =  e.  ■ 

Exercises 


31.3-1 

Draw  the  group  operation  tables  for  the  groups  (Z4,  +4)  and  (Z*,  -5).  Show  that 
these  groups  are  isomorphic  by  exhibiting  a  one-to-one  correspondence  a  between 
their  elements  such  that  a  +  b  =  c  (mod  4)  if  and  only  if  a  (a)  ■  a  (b)  =  a(c ) 
(mod  5). 


31.3-2 

List  all  subgroups  of  Z9  and  of  Z*3. 


31.3-3 

Prove  Theorem  31.14. 


31.3-4 

Show  that  if  p  is  prime  and  e  is  a  positive  integer,  then 
Hpe)  =  pe~1(p- 1). 


31.3-5 

Show  that  for  any  integer  n  >  1  and  for  any  a  €  Z*,  the  function  fa  :  Z*  — >•  Z* 
defined  by  fa(x )  =  ax  mod  n  is  a  permutation  of  Z*. 


31.4  Solving  modular  linear  equations 

We  now  consider  the  problem  of  finding  solutions  to  the  equation 

ax  =  b  (mod  n)  ,  (31.25) 

where  a  >  0  and  n  >  0.  This  problem  has  several  applications;  for  example, 
we  shall  use  it  as  part  of  the  procedure  for  finding  keys  in  the  RSA  public-key 
cryptosystem  in  Section  31.7.  We  assume  that  a,  b,  and  n  are  given,  and  we  wish 
to  find  all  values  of  x,  modulo  n,  that  satisfy  equation  (31.25).  The  equation  may 
have  zero,  one,  or  more  than  one  such  solution. 

Let  (a)  denote  the  subgroup  of  Z„  generated  by  a.  Since  (a)  —  {a(x)  :  x  >  0}  = 
{ax  mod  n  :  x  >  0},  equation  (31.25)  has  a  solution  if  and  only  if  [b]  e  {a).  La¬ 
grange’s  theorem  (Theorem  31.15)  tells  us  that  |(a)|  must  be  a  divisor  of  n.  The 
following  theorem  gives  us  a  precise  characterization  of  {a). 
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Theorem  31.20 

For  any  positive  integers  a  and  n,  if  d  =  gcd (a.n),  then 

(a)  =  (d)  =  {0,  d,  2d. . . . ,  (( n/d )  —  1 ) c/ }  (31.26) 

in  Z„ ,  and  thus 
\{a)\  =  n/d  . 

Proof  We  begin  by  showing  that  d  e  (a).  Recall  that  Extended-Euclid  {a.n) 
produces  integers  x'  and  y'  such  that  ax'  +  ny'  =  d.  Thus,  ax'  =  d  (mod  n),  so 
that  d  €  {a).  In  other  words,  d  is  a  multiple  of  a  in  Z„. 

Since  d  e  (a),  it  follows  that  every  multiple  of  d  belongs  to  (a),  because  any 
multiple  of  a  multiple  of  a  is  itself  a  multiple  of  a.  Thus,  (a)  contains  every  element 

in  {0,  d.  2d _ _  {{n/d)  —  1)^}.  That  is,  {d)  C  {a). 

We  now  show  that  {a)  C  (d).  If  m  e  {a),  then  m  =  ax  mod  n  for  some 
integer  x,  and  so  m  =  ax  +  ny  for  some  integer  y.  However,  d  \  a  and  d  \  n,  and 
so  d  |  m  by  equation  (31.4).  Therefore,  m  €  (d). 

Combining  these  results,  we  have  that  {a)  —  (d).  To  see  that  |(a)|  =  n/d, 
observe  that  there  are  exactly  n/d  multiples  of  d  between  0  and  n  —  1 ,  inclusive.  ■ 

Corollary  31.21 

The  equation  ax  =  b  (mod  n)  is  solvable  for  the  unknown  x  if  and  only  if  d  \  b, 
where  d  =  gcd(a,/?). 

Proof  The  equation  ax  =  b  (mod  n)  is  solvable  if  and  only  if  [b\  e  {a),  which 
is  the  same  as  saying 

{b  mod  n)  €  {0,  d,  2d, . . . ,  {{n/d )  —  1  )d}  , 

by  Theorem  31.20.  If  0  <  b  <  n,  then  b  6  (a)  if  and  only  if  d  \  b,  since  the 
members  of  (a)  are  precisely  the  multiples  of  d.  If  b  <  0  or  b  >  n,  the  corollary 
then  follows  from  the  observation  that  d  \  b  if  and  only  if  d  \  (b  mod  n),  since  b 
and  b  mod  n  differ  by  a  multiple  of  n,  which  is  itself  a  multiple  of  d.  m 

Corollary  31.22 

The  equation  ax  =  b  (mod  n)  either  has  d  distinct  solutions  modulo  n,  where 
d  =  gcd  (a,  n),  or  it  has  no  solutions. 

Proof  If  ax  =  b  (mod  n)  has  a  solution,  then  b  e  {a).  By  Theorem  31.17, 
ord(a)  =  |  (a)  |,  and  so  Corollary  31.18  and  Theorem  31.20  imply  that  the  sequence 
ai  mod  n,  for  i  =0,1,...,  is  periodic  with  period  \{a)  \  =  n/d.  If  b  G  (a),  then  b 
appears  exactly  d  times  in  the  sequence  ai  mod  n,  for  i  =  0, 1, ...,»  —  1,  since 
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the  length-(n/c/ )  block  of  values  (a)  repeats  exactly  d  times  as  i  increases  from  0 
to  n  —  1.  The  indices  x  of  the  d  positions  for  which  ax  mod  n  —  b  are  the  solutions 
of  the  equation  ax  =  b  (mod  n).  m 

Theorem  31.23 

Let  d  =  gcd (a.n),  and  suppose  that  d  =  ax'  +  ny'  for  some  integers  x'  and  y' 
(for  example,  as  computed  by  Extended-Euclid).  If  d  \  b,  then  the  equation 
ax  =  b  (mod  n )  has  as  one  of  its  solutions  the  value  x0,  where 

x0  =  x'(b/d)  mod  n  . 


Proof  We  have 

ax0  =  ax'(b/d)  (mod  n) 

=  d(b/d)  (mod  n)  (because  ax'  =  d  (mod//)) 

=  b  (mod  n )  , 

and  thus  x0  is  a  solution  to  ax  =  b  (mod//).  ■ 

Theorem  31.24 

Suppose  that  the  equation  ax  =  b  (mod  //)  is  solvable  (that  is,  d  \  b,  where 
d  =  gcd (a.n))  and  that  x0  is  any  solution  to  this  equation.  Then,  this  equa¬ 
tion  has  exactly  d  distinct  solutions,  modulo  //,  given  by  x,  =  x0  +  ifn/d)  for 
/'  =  0,  l,...,d-  1. 


Proof  Because  n/d  >  0  and  0  <  i(n/d )  <  //  for  /  =  0, 1,  ...,d  —  1,  the 
values  x0,  X\ , . . . ,  Xd~ i  are  all  distinct,  modulo  //.  Since  x0  is  a  solution  of  ax  =  b 
(mod  n),  we  have  ax0  mod  n  =  b  (mod  //).  Thus,  for  /  =  0,  1, . . .  ,d  —  1,  we 
have 


aXj  mod  // 


a(x0  +  in/d)  mod  n 
(ax o  +  ai n/d)  mod  // 

ax o  mod  n  (because  d  \  a  implies  that  ain/d  is  a  multiple  of  n) 
b  (mod  //)  , 


and  hence  aXi  =  b  (mod  //),  making  x,  a  solution,  too.  By  Corollary  31.22,  the 
equation  ax  =  b  (mod  n)  has  exactly  d  solutions,  so  that  x0,  X\, . . . ,  Xd-i  must 
be  all  of  them.  ■ 


We  have  now  developed  the  mathematics  needed  to  solve  the  equation  ax  =  b 
(mod  //);  the  following  algorithm  prints  all  solutions  to  this  equation.  The  inputs 
a  and  //  are  arbitrary  positive  integers,  and  b  is  an  arbitrary  integer. 
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Modular-Linear-Equation-Solver(a,  b,  n) 

1  ( d,x',y ')  =  Extended-Euclid  (a,  n) 

2  if  d  |  b 

3  x0  =  x'(b/d)  mod  n 

4  for  i  =  0  to  d  —  1 

5  print  ( x0  +  i(n/d))  mod  n 

6  else  print  “no  solutions” 

As  an  example  of  the  operation  of  this  procedure,  consider  the  equation  I  4a  = 
30  (mod  100)  (here,  a  =  14,  b  =  30,  and  n  =  100).  Calling  Extended- 
Euclid  in  line  1,  we  obtain  ( d,x',y ')  =  (2,— 7,1).  Since  2  |  30,  lines  3-5 
execute.  Line  3  computes  x0  =  (—7) (15)  mod  100  =  95.  The  loop  on  lines  4-5 
prints  the  two  solutions  95  and  45. 

The  procedure  Modular-Linear-Equation-Solver  works  as  follows. 
Line  1  computes  d  =  gcd(a,/r),  along  with  two  values  x'  and  y'  such  that  d  = 
ax'  +  ny' ,  demonstrating  that  x'  is  a  solution  to  the  equation  ax'  =  d  (mod  n). 
If  d  does  not  divide  b,  then  the  equation  ax  =  b  (mod  n)  has  no  solution,  by 
Corollary  31.21.  Line  2  checks  to  see  whether  d  \  b\  if  not,  line  6  reports  that  there 
are  no  solutions.  Otherwise,  line  3  computes  a  solution  x0  to  ax  =  b  (mod  n), 
in  accordance  with  Theorem  31.23.  Given  one  solution,  Theorem  31.24  states  that 
adding  multiples  of  ( n/d ),  modulo  n,  yields  the  other  d  —  1  solutions.  The  for 
loop  of  lines  4-5  prints  out  all  d  solutions,  beginning  with  x„  and  spaced  n/d 
apart,  modulo  n . 

Modular-Linear-Equation-Solver  performs  0(lgn  +  gcd (a.n))  arith¬ 
metic  operations,  since  Extended-Euclid  performs  0( lg  n)  arithmetic  opera¬ 
tions,  and  each  iteration  of  the  for  loop  of  lines  4-5  performs  a  constant  number  of 
arithmetic  operations. 

The  following  corollaries  of  Theorem  31.24  give  specializations  of  particular 
interest. 

Corollary  31.25 

For  any  n  >  1,  if  gcd (a,n)  —  1,  then  the  equation  ax  =  b  (mod  n)  has  a  unique 
solution,  modulo  n .  m 

If  b  =  1 ,  a  common  case  of  considerable  interest,  the  x  we  are  looking  for  is  a 
multiplicative  inverse  of  a,  modulo  n. 

Corollary  31.26 

For  any  n  >  1,  if  gcd(a,  n)  =  1,  then  the  equation  ax  =  1  (mod  n)  has  a  unique 
solution,  modulo  n.  Otherwise,  it  has  no  solution.  ■ 
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Thanks  to  Corollary  31.26,  we  can  use  the  notation  a~x  mod  n  to  refer  to  the 
multiplicative  inverse  of  a.  modulo  n,  when  a  and  n  are  relatively  prime.  If 
gcd(<a,/i)  =  1,  then  the  unique  solution  to  the  equation  ax  =  1  (mod  n)  is  the 
integer  x  returned  by  Extended-Euclid,  since  the  equation 

gcd(a,  n)  =  1  =  ax  +  ny 

implies  ax  =  1  (mod  n).  Thus,  we  can  compute  a-1  mod  n  efficiently  using 
Extended-Euclid. 

Exercises 


31.4-1 

Find  all  solutions  to  the  equation  35x  =  10  (mod  50). 


31.4-2 

Prove  that  the  equation  ax  =  ay  (mod  n)  implies  x  =  y  (mod  n)  whenever 
gcd(a,  n)  =  1.  Show  that  the  condition  gcd (a.n)  =  1  is  necessary  by  supplying  a 
counterexample  with  gcd(a,  n)  >  1. 


31.4- 3 

Consider  the  following  change  to  line  3  of  the  procedure  Modular-Linear- 
Equation-Solver: 

3  x0  =  x\b/d)  mod  ( n/d ) 

Will  this  work?  Explain  why  or  why  not. 

31.4- 4  * 

Let  p  be  prime  and  /(x)  =  f0  +  j\x  +  •  •  •  +  ftx *  (mod  p)  be  a  polyno¬ 
mial  of  degree  f,  with  coefficients  f  drawn  from  Zp.  We  say  that  a  e 
is  a  zero  of  /  if  f(a)  =  0  (mod  p).  Prove  that  if  a  is  a  zero  of  /,  then 
f(x)  =  (x  —  a)g(x)  (mod  p)  for  some  polynomial  g(x)  of  degree  t  —  1.  Prove 
by  induction  on  t  that  if  p  is  prime,  then  a  polynomial  / (x)  of  degree  t  can  have 
at  most  t  distinct  zeros  modulo  p. 


31.5  The  Chinese  remainder  theorem 

Around  A.D.  100,  the  Chinese  mathematician  Sun-Tsu  solved  the  problem  of  find¬ 
ing  those  integers  x  that  leave  remainders  2,  3,  and  2  when  divided  by  3,  5,  and  7 
respectively.  One  such  solution  is  x  =  23;  all  solutions  are  of  the  form  23  +  105 k 
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for  arbitrary  integers  k.  The  “Chinese  remainder  theorem"  provides  a  correspon¬ 
dence  between  a  system  of  equations  modulo  a  set  of  pairwise  relatively  prime 
moduli  (for  example,  3,  5,  and  7)  and  an  equation  modulo  their  product  (for  exam¬ 
ple,  105). 

The  Chinese  remainder  theorem  has  two  major  applications.  Let  the  inte¬ 
ger  n  be  factored  as  n  =  nin2---nk,  where  the  factors  n,  are  pairwise  relatively 
prime.  First,  the  Chinese  remainder  theorem  is  a  descriptive  “structure  theorem” 
that  describes  the  structure  of  Z„  as  identical  to  that  of  the  Cartesian  product 
Z„j  x  Z„2  x  ■  ■  ■  x  Znk  with  componentwise  addition  and  multiplication  modulo  /?, 
in  the  i  th  component.  Second,  this  description  helps  us  to  design  efficient  algo¬ 
rithms,  since  working  in  each  of  the  systems  Z„;.  can  be  more  efficient  (in  terms  of 
bit  operations)  than  working  modulo  n . 

Theorem  31.27  ( Chinese  remainder  theorem) 

Let  n  =  nxn2---nk,  where  the  /?,  are  pairwise  relatively  prime.  Consider  the 
correspondence 

a  o  (aua2,  •••,«*)  ,  (31.27) 

where  a  e  Z„ ,  a ,•  e  Z„ . ,  and 
at  =  a  mod  n, 

for  i  =  1,2 , ,k.  Then,  mapping  (31.27)  is  a  one-to-one  correspondence  (Injec¬ 
tion)  between  Z„  and  the  Cartesian  product  Zni  x  Z„2  x  ■  ■  ■  x  Z„k.  Operations  per¬ 
formed  on  the  elements  of  Z„  can  be  equivalently  performed  on  the  corresponding 
^-tuples  by  performing  the  operations  independently  in  each  coordinate  position  in 
the  appropriate  system.  That  is,  if 

a  (ai,a2, . . .  ,ak)  , 

b  (bub2,...,bk)  , 

then 

(a  +  b)  mod  n  <h>-  {{a\  +  b\)  mod  n\, . . . ,  (ak  +  bk)  mod  nk)  ,  (31.28) 

(a  —  b)  mod  n  ((ax  —  ^i)  mod  «i, . . . ,  {ak  —  bk)  mod  nk)  ,  (31.29) 

(ab)  mod  n  -o-  (axb\moAii\ . akbk  mod  nk)  .  (31.30) 


Proof  Transforming  between  the  two  representations  is  fairly  straightforward. 
Going  from  a  to  (al,a2, . . . , ak)  is  quite  easy  and  requires  only  k  “mod”  opera¬ 
tions. 

Computing  a  from  inputs  {ax,  a2, . . . ,  ak )  is  a  bit  more  complicated.  We  begin 

by  defining  m,  =  for  i  =  1.2 . k;  thus  m,  is  the  product  of  all  of  the  Uj ’s 

other  than  /i,-:  m,  =  nxn2  ■  ■  ■  n,-\ni+i  ■  ■  ■  nk.  We  next  define 
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Cj  =  niiiirij  1  mod/?,)  (31.31) 

for  i  =  1,2, ...  ,k.  Equation  (31.31)  is  always  well  defined:  since  /??,•  and  /?,■  are 
relatively  prime  (by  Theorem  31.6),  Corollary  31.26  guarantees  that  mf1  mod  /?,■ 
exists.  Finally,  we  can  compute  a  as  a  function  of  ax,  a2,  ■  ■  ■ ,  ak  as  follows: 

a  =  {axcx  +  a2c2  +  ■  ■  ■  +  agCg)  (mod/?).  (31.32) 

We  now  show  that  equation  (31.32)  ensures  that  a  =  a,-  (mod  /?,)  for  i  = 

l, 2 , ...  ,k.  Note  that  if  j  ^  i,  then  mj  =  0  (mod  /?,),  which  implies  that  Cj  = 

m. j  =  0  (mod  /?,).  Note  also  that  c,-  =  1  (mod  /?,),  from  equation  (31.31).  We 
thus  have  the  appealing  and  useful  correspondence 

Ci  o  (0, 0 _ _  0,  1, 0 _ _  0)  , 

a  vector  that  has  Os  everywhere  except  in  the  ?  th  coordinate,  where  it  has  a  1 ;  the  c,- 
thus  form  a  “basis”  for  the  representation,  in  a  certain  sense.  For  each  i ,  therefore, 
we  have 

a  =  ctiCj  (mod  /?,) 

=  a,/??,  (???7'  mod  /?,)  (mod/?,) 

=  a ,  (mod  /?,)  , 

which  is  what  we  wished  to  show:  our  method  of  computing  a  from  the  a,  ’s  pro¬ 
duces  a  result  a  that  satisfies  the  constraints  a  =  a,-  (mod  /?,)  for  i  =  1,2 , ...  ,k. 
The  correspondence  is  one-to-one,  since  we  can  transform  in  both  directions. 
Finally,  equations  (3 1 .28)— (3 1 .30)  follow  directly  from  Exercise  31.1-7,  since 
x  mod  /?,  =  (x  mod  /?)  mod  /?,  for  any  x  and  i  =  1, 2, . . . ,  k.  m 

We  shall  use  the  following  corollaries  later  in  this  chapter. 

Corollary  31.28 

If  /? i ,  n2, . . . ,  tik  are  pairwise  relatively  prime  and  n  =  nxn2  ■  ■  ■  /?*,  then  for  any 
integers  ax,  a2, . . . ,  a*,  the  set  of  simultaneous  equations 

x  =  <3,-  (mod  /?,)  , 

for  ?  =  1,2 , ...  ,k,  has  a  unique  solution  modulo  /?  for  the  unknown  x.  ■ 


Corollary  31.29 

If  /? i ,  n2,  ...,/?£  are  pairwise  relatively  prime  and  /? 
integers  x  and  a, 

x  =  a  (mod  /?,) 

for  /'  =  1,2, ...  ,k  if  and  only  if 

x  =  a  (mod  /?)  . 


=  /? i /? 2  ■  •  •  Ok,  then  for  all 
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Figure  31.3  An  illustration  of  the  Chinese  remainder  theorem  for  n  i  =  5  and  n 2  =  13.  For  this 
example,  c  1  =  26  and  02  =  40.  In  row  i,  column  j  is  shown  the  value  of  a ,  modulo  65,  such 
that  a  mod  5  =  i  and  a  mod  13  =  j .  Note  that  row  0,  column  0  contains  a  0.  Similarly,  row  4, 
column  12  contains  a  64  (equivalent  to  —1).  Since  c\  =  26,  moving  down  a  row  increases  a  by  26. 
Similarly,  C2  =  40  means  that  moving  right  by  a  column  increases  a  by  40.  Increasing  a  by  1 
corresponds  to  moving  diagonally  downward  and  to  the  right,  wrapping  around  from  the  bottom  to 
the  top  and  from  the  right  to  the  left. 

As  an  example  of  the  application  of  the  Chinese  remainder  theorem,  suppose  we 
are  given  the  two  equations 

a  =  2  (mod  5)  , 
a  =  3  (mod  13)  , 

so  that  d\  =  2,  tii  =  m2  =  5,  a2  =  3,  and  n2  =  =  13,  and  we  wish 

to  compute  a  mod  65,  since  n  =  n\n2  =  65.  Because  13_1  =  2  (mod  5)  and 
5_1  =  8  (mod  13),  we  have 

Ci  =  13(2  mod  5)  =  26, 
c2  =  5(8  mod  13)  =  40 , 


and 


(mod  65) 
(mod  65) 
(mod  65) 


2-26  +  3-40 
52+  120 
42 


a 


See  Figure  31.3  for  an  illustration  of  the  Chinese  remainder  theorem,  modulo  65. 

Thus,  we  can  work  modulo  n  by  working  modulo  n  directly  or  by  working  in  the 
transformed  representation  using  separate  modulo  n,  computations,  as  convenient. 
The  computations  are  entirely  equivalent. 


Exercises 


31.5-1 

Find  all  solutions  to  the  equations  x  =  4  (mod  5)  and  x  =  5  (mod  11). 
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31.5-2 

Find  all  integers  x  that  leave  remainders  1,  2,  3  when  divided  by  9,  8,  7  respectively. 


31.5- 3 

Argue  that,  under  the  definitions  of  Theorem  31.27,  if  gcd (a.n)  =  1,  then 
(a~l  mod  n)  -o-  ((a^1  mod  «i)>  (<U  1  m°d  ni )>  •  •  • ,  (A*1  mod  nk))  . 

31.5- 4 

Under  the  definitions  of  Theorem  3 1 .27,  prove  that  for  any  polynomial  / ,  the  num¬ 
ber  of  roots  of  the  equation  fix)  =  0  (mod  n)  equals  the  product  of  the  number 
of  roots  of  each  of  the  equations  fix)  =  0  (mod  nf),  fix)  =  0  (mod  n2),  . . . , 
fix)  =  0  (mod  nk). 


31.6  Powers  of  an  element 

Just  as  we  often  consider  the  multiples  of  a  given  element  a,  modulo  n,  we  consider 
the  sequence  of  powers  of  a,  modulo  n,  where  a  e  Z*: 

a0,  a1,  a2,  a3, . . . ,  (31.33) 

modulo  n.  Indexing  from  0,  the  0th  value  in  this  sequence  is  a0  mod  n  =  1,  and 
the  / th  value  is  a'  mod  n.  For  example,  the  powers  of  3  modulo  7  are 

i  0  1  23456789  10  11  ••• 

3'  mod  7  1326451326  4  5  ••• 

whereas  the  powers  of  2  modulo  7  are 

i  0  1  23456789  10  11  ••• 

2‘  mod  7  1241241241  2  4  ••• 

In  this  section,  let  (a)  denote  the  subgroup  of  Z*  generated  by  a  by  repeated 
multiplication,  and  let  ord„(<a)  (the  “order  of  a,  modulo  n”)  denote  the  order  of  a 
in  Z*.  For  example,  (2)  =  {1,2,  4}  in  Z*,  and  ord7(2)  =  3.  Using  the  definition  of 
the  Euler  phi  function  <j>iri)  as  the  size  of  Z*  (see  Section  31.3),  we  now  translate 
Corollary  31.19  into  the  notation  of  Z*  to  obtain  Euler’s  theorem  and  specialize  it 
to  Z*,  where  p  is  prime,  to  obtain  Fermat’s  theorem. 

Theorem  31.30  (Euler’s  theorem ) 

For  any  integer  n  >  1 , 

=  ]  (mocj  for  all  a  e  Z*  .  ■ 
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Theorem  31.31  (Fermat’s  theorem ) 

If  p  is  prime,  then 

ap~ 1  =  1  (mod  p)  for  all  a  e  Z*  . 

Proof  By  equation  (31.21),  <fi(p)  =  p  —  1  if  p  is  prime.  ■ 

Fermat’s  theorem  applies  to  every  element  in  Zp  except  0,  since  0  f  Z*.  For  all 
a  €  Zp,  however,  we  have  ap  =  a  (mod  p)  if  p  is  prime. 

If  ord„(g)  =  |Z*|,  then  every  element  in  Z*  is  a  power  of  g,  modulo  n,  and 
g  is  a  primitive  root  or  a  generator  of  Z*.  For  example,  3  is  a  primitive  root, 
modulo  7,  but  2  is  not  a  primitive  root,  modulo  7.  If  Z*  possesses  a  primitive 
root,  the  group  Z*  is  cyclic.  We  omit  the  proof  of  the  following  theorem,  which  is 
proven  by  Niven  and  Zuckerman  [265]. 

Theorem  31.32 

The  values  of  n  >  1  for  which  Z*  is  cyclic  are  2,  4,  pe ,  and  2 pe,  for  all  primes 
p  >  2  and  all  positive  integers  e.  m 


If  g  is  a  primitive  root  of  Z*  and  a  is  any  element  of  Z* ,  then  there  exists  a  z  such 
that  gz  =  a  (mod  n).  This  z  is  a  discrete  logarithm  or  an  index  of  a,  modulo  n, 
to  the  base  g;  we  denote  this  value  as  ind„^(a). 

Theorem  31.33  ( Discrete  logarithm  theorem) 

If  g  is  a  primitive  root  of  Z* ,  then  the  equation  gx  =  gy  (mod  n )  holds  if  and 
only  if  the  equation  x  =  y  (mod  4>(n))  holds. 

Proof  Suppose  first  that  x  =  y  (mod  4>(n)).  Then,  x  =  y  +  kf(n)  for  some 
integer  k.  Therefore, 


g  = 


8 

gy 

gy 


y+k<j>(n) 

■  lk 


=  gy 


(mod  n) 
(mod  n ) 
(mod  n) 
(mod  n) 


(by  Euler’s  theorem) 


Conversely,  suppose  that  gx  =  gy  (mod  n).  Because  the  sequence  of  powers  of  g 
generates  every  element  of  (g)  and  |(g)|  =  4>(n),  Corollary  31.18  implies  that 
the  sequence  of  powers  of  g  is  periodic  with  period  (f>{n).  Therefore,  if  gx  =  gy 
(mod  n),  then  we  must  have  x  =  y  (mod  <p(n)).  m 

We  now  turn  our  attention  to  the  square  roots  of  1 ,  modulo  a  prime  power.  The 
following  theorem  will  be  useful  in  our  development  of  a  primality-testing  algo¬ 
rithm  in  Section  31.8. 
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Theorem  31.34 

If  p  is  an  odd  prime  and  e  >  I ,  then  the  equation 

x2  =  1  (mod  pe)  (31.34) 

has  only  two  solutions,  namely  x  =  1  and  x  =  —  1 . 

Proof  Equation  (31.34)  is  equivalent  to 
pe  |  (x  -  l)(x  +  1)  . 

Since  p  >  2,  we  can  have  p  |  (x  —  1)  or  p  |  (x  +  1),  but  not  both.  (Otherwise, 
by  property  (31.3),  p  would  also  divide  their  difference  (x  +  1)  —  (x  —  1)  =  2.) 
If  p  j  (x  —  1),  then  gcd (pe,x  —  1)  =  1,  and  by  Corollary  31.5,  we  would  have 
pe  |  (x  +  1).  That  is,  x  =  —1  (mod  pe).  Symmetrically,  if  p  \  (x  +  1), 
then  gcd (pe,x  +  1)  =  1,  and  Corollary  31.5  implies  that  pe  |  (x  —  1),  so  that 
x  =  1  (mod  pe).  Therefore,  either  x  =  —1  (mod  pe)  or  x  =  1  (mod  pe).  m 

A  number  x  is  a  nontrivial  square  root  of  1,  modulo  n,  if  it  satisfies  the  equation 
x2  =  1  (mod  n )  but  x  is  equivalent  to  neither  of  the  two  “trivial”  square  roots: 
1  or  —1,  modulo  n.  For  example,  6  is  a  nontrivial  square  root  of  1,  modulo  35. 
We  shall  use  the  following  corollary  to  Theorem  31.34  in  the  correctness  proof  in 
Section  31.8  for  the  Miller-Rabin  primality-testing  procedure. 

Corollary  31.35 

If  there  exists  a  nontrivial  square  root  of  1,  modulo  n,  then  n  is  composite. 

Proof  By  the  contrapositive  of  Theorem  31.34,  if  there  exists  a  nontrivial  square 
root  of  1,  modulo  n,  then  n  cannot  be  an  odd  prime  or  a  power  of  an  odd  prime. 
If  x2  =  1  (mod  2),  then  x  =  1  (mod  2),  and  so  all  square  roots  of  1,  modulo  2, 
are  trivial.  Thus,  n  cannot  be  prime.  Finally,  we  must  have  n  >  1  for  a  nontrivial 
square  root  of  1  to  exist.  Therefore,  n  must  be  composite.  ■ 

Raising  to  powers  with  repeated  squaring 

A  frequently  occurring  operation  in  number-theoretic  computations  is  raising  one 
number  to  a  power  modulo  another  number,  also  known  as  modular  exponentia¬ 
tion.  More  precisely,  we  would  like  an  efficient  way  to  compute  ah  mod  n,  where 
a  and  b  are  nonnegative  integers  and  n  is  a  positive  integer.  Modular  exponenti¬ 
ation  is  an  essential  operation  in  many  primality-testing  routines  and  in  the  RSA 
public -key  cryptosystem.  The  method  of  repeated  squaring  solves  this  problem 
efficiently  using  the  binary  representation  of  b. 

Fet  (bk,  bk- 1, . . . ,  b\.  b0)  be  the  binary  representation  of  b.  (That  is,  the  binary 
representation  is  k  +  1  bits  long,  bg  is  the  most  significant  bit,  and  b0  is  the  least 
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Figure  31.4  The  results  of  Modular  Exponentiation  when  computing  ab  (mod  n),  where 
a  =  7,  b  =  560  =  (1000110000),  and  n  =  561.  The  values  are  shown  after  each  execution  of  the 
for  loop.  The  final  result  is  1. 

significant  bit.)  The  following  procedure  computes  ac  mod  n  as  c  is  increased  by 
doublings  and  incrementations  from  0  to  b. 

Modular-Exponentiation  ( a,b,n ) 

1  c  =  0 

2  d  =  1 

3  let  {bk,bk-\, . . . ,  /to)  be  the  binary  representation  of  b 

4  for  i  =  k  downto  0 

5  c  =  2c 

6  d  =  (d  ■  d)  mod  n 

7  if  bi  ==  1 

8  c  —  c  +  1 

9  d  =  (d  ■  a)  mod  n 
10  return  d 

The  essential  use  of  squaring  in  line  6  of  each  iteration  explains  the  name  “repeated 
squaring.”  As  an  example,  for  a  =  7,  b  =  560,  and  n  =  561,  the  algorithm 
computes  the  sequence  of  values  modulo  561  shown  in  Figure  31.4;  the  sequence 
of  exponents  used  appears  in  the  row  of  the  table  labeled  by  c. 

The  variable  c  is  not  really  needed  by  the  algorithm  but  is  included  for  the  fol¬ 
lowing  two-part  loop  invariant: 

Just  prior  to  each  iteration  of  the  for  loop  of  lines  4-9, 

1.  The  value  of  c  is  the  same  as  the  prefix  (bk.bk-i , . . . ,  bi+\)  of  the  binary 
representation  of  b,  and 

2.  d  —  ac  mod  n. 

We  use  this  loop  invariant  as  follows: 

Initialization:  Initially,  i  =  k,  so  that  the  prefix  (bk,  bk-i,  ....  bj+ 1)  is  empty, 
which  corresponds  to  c  =  0.  Moreover,  <7=1=  c/°  mod  n. 
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Maintenance:  Let  c'  and  d'  denote  the  values  of  c  and  d  at  the  end  of  an  iteration 
of  the  for  loop,  and  thus  the  values  prior  to  the  next  iteration.  Each  iteration 
updates  c'  —  2c  (if  bt  =  0)  or  c'  =  2c  +  1  (if  bt  =  1),  so  that  c  will  be  correct 
prior  to  the  next  iteration.  If  bt  —  0,  then  d'  =  d2  mod  n  =  (ac)2  mod  n  = 
a2c  mod  n  =  ac  mod  n.  If  =  1,  then  d'  =  d2a  mod  n  =  ( ac)2a  mod  n  = 
a2c+l  mod  n  =  ac  mod  n.  In  either  case,  d  =  ac  mod  n  prior  to  the  next 
iteration. 

Termination:  At  termination,  i  =  —  1.  Thus,  c  =  b,  since  c  has  the  value  of  the 
prefix  (bk,bk-i, . . . ,  b0)  of  V s  binary  representation.  Hence  d  =  ac  mod  n  — 
ab  mod  n. 

If  the  inputs  a,  b,  and  n  are  yS-bit  numbers,  then  the  total  number  of  arith¬ 
metic  operations  required  is  0(/3)  and  the  total  number  of  bit  operations  required 
is  0(y63). 

Exercises 


31.6-1 

Draw  a  table  showing  the  order  of  every  element  in  .  Pick  the  smallest  primitive 
root  g  and  compute  a  table  giving  indlljg(x)  for  all  x  e  Z*t. 


31.6-2 

Give  a  modular  exponentiation  algorithm  that  examines  the  bits  of  b  from  right  to 
left  instead  of  left  to  right. 


31.6-3 

Assuming  that  you  know  < p(n ),  explain  how  to  compute  mod  n  for  any  a  €  Z* 
using  the  procedure  Modular-Exponentiation. 


31.7  The  RSA  public-key  cryptosystem 

With  a  public-key  cryptosystem,  we  can  encrypt  messages  sent  between  two  com¬ 
municating  parties  so  that  an  eavesdropper  who  overhears  the  encrypted  messages 
will  not  be  able  to  decode  them.  A  public -key  cryptosystem  also  enables  a  party 
to  append  an  unforgeable  “digital  signature”  to  the  end  of  an  electronic  message. 
Such  a  signature  is  the  electronic  version  of  a  handwritten  signature  on  a  paper  doc¬ 
ument.  It  can  be  easily  checked  by  anyone,  forged  by  no  one,  yet  loses  its  validity 
if  any  bit  of  the  message  is  altered.  It  therefore  provides  authentication  of  both  the 
identity  of  the  signer  and  the  contents  of  the  signed  message.  It  is  the  perfect  tool 
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for  electronically  signed  business  contracts,  electronic  checks,  electronic  purchase 
orders,  and  other  electronic  communications  that  parties  wish  to  authenticate. 

The  RSA  public -key  cryptosystem  relies  on  the  dramatic  difference  between  the 
ease  of  finding  large  prime  numbers  and  the  difficulty  of  factoring  the  product  of 
two  large  prime  numbers.  Section  31.8  describes  an  efficient  procedure  for  finding 
large  prime  numbers,  and  Section  31.9  discusses  the  problem  of  factoring  large 
integers. 

Public-key  cryptosystems 

In  a  public-key  cryptosystem,  each  participant  has  both  a  public  key  and  a  secret 
key.  Each  key  is  a  piece  of  information.  For  example,  in  the  RSA  cryptosystem, 
each  key  consists  of  a  pair  of  integers.  The  participants  “Alice”  and  “Bob”  are 
traditionally  used  in  cryptography  examples;  we  denote  their  public  and  secret 
keys  as  PA ,  SA  for  Alice  and  PB,  SB  for  Bob. 

Each  participant  creates  his  or  her  own  public  and  secret  keys.  Secret  keys  are 
kept  secret,  but  public  keys  can  be  revealed  to  anyone  or  even  published.  In  fact, 
it  is  often  convenient  to  assume  that  everyone’s  public  key  is  available  in  a  pub¬ 
lic  directory,  so  that  any  participant  can  easily  obtain  the  public  key  of  any  other 
participant. 

The  public  and  secret  keys  specify  functions  that  can  be  applied  to  any  message. 
Let  S)  denote  the  set  of  permissible  messages.  For  example,  D  might  be  the  set  of 
all  finite-length  bit  sequences.  In  the  simplest,  and  original,  formulation  of  public- 
key  cryptography,  we  require  that  the  public  and  secret  keys  specify  one-to-one 
functions  from  ID  to  itself.  We  denote  the  function  corresponding  to  Alice’s  public 
key  Pa  by  PA ( )  and  the  function  corresponding  to  her  secret  key  SA  by  SA().  The 
functions  PA  ()  and  SA  ()  are  thus  peimutations  of  D.  We  assume  that  the  functions 
PA ()  and  SA ()  are  efficiently  computable  given  the  corresponding  key  PA  or  SA. 

The  public  and  secret  keys  for  any  participant  are  a  “matched  pair”  in  that  they 
specify  functions  that  are  inverses  of  each  other.  That  is, 


M  =  Sa(Pa(M))  , 
M  =  Pa(Sa(M)) 


(31.35) 

(31.36) 


for  any  message  M  s  ID.  Transforming  M  with  the  two  keys  PA  and  SA  succes¬ 
sively,  in  either  order,  yields  the  message  M  back. 

In  a  public-key  cryptosystem,  we  require  that  no  one  but  Alice  be  able  to  com¬ 
pute  the  function  SA()  in  any  practical  amount  of  time.  This  assumption  is  crucial 
to  keeping  encrypted  mail  sent  to  Alice  private  and  to  knowing  that  Alice’s  digi¬ 
tal  signatures  are  authentic.  Alice  must  keep  SA  secret;  if  she  does  not,  she  loses 
her  uniqueness  and  the  cryptosystem  cannot  provide  her  with  unique  capabilities. 
The  assumption  that  only  Alice  can  compute  SA  ()  must  hold  even  though  everyone 
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Figure  31.5  Encryption  in  a  public  key  system.  Bob  encrypts  the  message  M  using  Alice’s  public 
key  Pt 4  and  transmits  the  resulting  ciphertext  C  =  PA(M)  over  a  communication  channel  to  A1 
ice.  An  eavesdropper  who  captures  the  transmitted  ciphertext  gains  no  information  about  M .  Alice 
receives  C  and  decrypts  it  using  her  secret  key  to  obtain  the  original  message  M  =  5^(C). 

knows  PA  and  can  compute  PA(),  the  inverse  function  to  S4  (),  efficiently.  In  order 
to  design  a  workable  public-key  cryptosystem,  we  must  figure  out  how  to  create 
a  system  in  which  we  can  reveal  a  transformation  PA{)  without  thereby  revealing 
how  to  compute  the  corresponding  inverse  transformation  SA().  This  task  appears 
formidable,  but  we  shall  see  how  to  accomplish  it. 

In  a  public-key  cryptosystem,  encryption  works  as  shown  in  Figure  31.5.  Sup¬ 
pose  Bob  wishes  to  send  Alice  a  message  M  encrypted  so  that  it  will  look  like 
unintelligible  gibberish  to  an  eavesdropper.  The  scenario  for  sending  the  message 
goes  as  follows. 

•  Bob  obtains  Alice’s  public  key  PA  (from  a  public  directory  or  directly  from 
Alice). 

•  Bob  computes  the  ciphertext  C  =  PA(M)  corresponding  to  the  message  M 
and  sends  C  to  Alice. 

•  When  Alice  receives  the  ciphertext  C ,  she  applies  her  secret  key  SA  to  retrieve 
the  original  message:  S4 ( C )  =  SA(PA(M))  =  M . 

Because  SA()  and  PA  ()  are  inverse  functions,  Alice  can  compute  M  from  C .  Be¬ 
cause  only  Alice  is  able  to  compute  SA(),  Alice  is  the  only  one  who  can  compute  M 
from  C .  Because  Bob  encrypts  M  using  PA  (),  only  Alice  can  understand  the  trans¬ 
mitted  message. 

We  can  just  as  easily  implement  digital  signatures  within  our  formulation  of  a 
public -key  cryptosystem.  (There  are  other  ways  of  approaching  the  problem  of 
constructing  digital  signatures,  but  we  shall  not  go  into  them  here.)  Suppose  now 
that  Alice  wishes  to  send  Bob  a  digitally  signed  response  M' .  Figure  31.6  shows 
how  the  digital-signature  scenario  proceeds. 

•  Alice  computes  her  digital  signature  a  for  the  message  M'  using  her  secret 
key  SA  and  the  equation  a  =  SA(M'). 
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Figure  31.6  Digital  signatures  in  a  public  key  system.  Alice  signs  the  message  M '  by  appending 
her  digital  signature  a  =  Sa(M')  to  it.  She  transmits  the  message/signature  pair  ( M',o )  to  Bob, 
who  verifies  it  by  checking  the  equation  M'  =  P^(o).  If  the  equation  holds,  he  accepts  (M' ,o)  as 
a  message  that  Alice  has  signed. 

•  Alice  sends  the  message/signature  pair  ( M',  a)  to  Bob. 

•  When  Bob  receives  (M\ cr),  he  can  verify  that  it  originated  from  Alice  by  us¬ 
ing  Alice’s  public  key  to  verify  the  equation  M’  =  Pa((J).  (Presumably,  M' 
contains  Alice’s  name,  so  Bob  knows  whose  public  key  to  use.)  If  the  equation 
holds,  then  Bob  concludes  that  the  message  M'  was  actually  signed  by  Alice. 
If  the  equation  fails  to  hold.  Bob  concludes  either  that  the  message  M'  or  the 
digital  signature  a  was  corrupted  by  transmission  errors  or  that  the  pair  (Mr,  a) 
is  an  attempted  forgery. 

Because  a  digital  signature  provides  both  authentication  of  the  signer’s  identity  and 
authentication  of  the  contents  of  the  signed  message,  it  is  analogous  to  a  handwrit¬ 
ten  signature  at  the  end  of  a  written  document. 

A  digital  signature  must  be  verifiable  by  anyone  who  has  access  to  the  signer’s 
public  key.  A  signed  message  can  be  verified  by  one  party  and  then  passed  on  to 
other  parties  who  can  also  verify  the  signature.  For  example,  the  message  might 
be  an  electronic  check  from  Alice  to  Bob.  After  Bob  verifies  Alice’s  signature  on 
the  check,  he  can  give  the  check  to  his  bank,  who  can  then  also  verify  the  signature 
and  effect  the  appropriate  funds  transfer. 

A  signed  message  is  not  necessarily  encrypted;  the  message  can  be  “in  the  clear” 
and  not  protected  from  disclosure.  By  composing  the  above  protocols  for  encryp¬ 
tion  and  for  signatures,  we  can  create  messages  that  are  both  signed  and  encrypted. 
The  signer  first  appends  his  or  her  digital  signature  to  the  message  and  then  en¬ 
crypts  the  resulting  message/signature  pair  with  the  public  key  of  the  intended  re¬ 
cipient.  The  recipient  decrypts  the  received  message  with  his  or  her  secret  key  to 
obtain  both  the  original  message  and  its  digital  signature.  The  recipient  can  then 
verify  the  signature  using  the  public  key  of  the  signer.  The  corresponding  com¬ 
bined  process  using  paper-based  systems  would  be  to  sign  the  paper  document  and 
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then  seal  the  document  inside  a  paper  envelope  that  is  opened  only  by  the  intended 
recipient. 

The  RSA  cryptosystem 

In  the  RSA  public-key  cryptosystem ,  a  participant  creates  his  or  her  public  and 
secret  keys  with  the  following  procedure: 

1 .  Select  at  random  two  large  prime  numbers  p  and  q  such  that  p  ^  q.  The  primes 
p  and  q  might  be,  say,  1024  bits  each. 

2.  Compute  n  =  pq. 

3.  Select  a  small  odd  integer  e  that  is  relatively  prime  to  4>(n),  which,  by  equa¬ 
tion  (31.20),  equals  (p  —  l)(q  —  l). 

4.  Compute  d  as  the  multiplicative  inverse  of  e,  modulo  4>(n).  (Corollary  31.26 
guarantees  that  d  exists  and  is  uniquely  defined.  We  can  use  the  technique  of 
Section  31.4  to  compute  d,  given  e  and  </>(«).) 

5.  Publish  the  pair  P  =  (e ,  n )  as  the  participant’s  RSA  public  key. 

6.  Keep  secret  the  pair  S  =  (d.  n)  as  the  participant’s  RSA  secret  key. 

For  this  scheme,  the  domain  S3  is  the  set  Z„.  To  transform  a  message  M  asso¬ 
ciated  with  a  public  key  P  =  (e,  n),  compute 

P(M)  =  Me  mod  n  .  (31.37) 

To  transform  a  ciphertext  C  associated  with  a  secret  key  S  =  ( d,n ),  compute 

S(C)  =  Cd  mod  n  .  (31.38) 

These  equations  apply  to  both  encryption  and  signatures.  To  create  a  signature,  the 
signer  applies  his  or  her  secret  key  to  the  message  to  be  signed,  rather  than  to  a 
ciphertext.  To  verify  a  signature,  the  public  key  of  the  signer  is  applied  to  it,  rather 
than  to  a  message  to  be  encrypted. 

We  can  implement  the  public-key  and  secret-key  operations  using  the  procedure 
Modular-Exponentiation  described  in  Section  31.6.  To  analyze  the  running 
time  of  these  operations,  assume  that  the  public  key  ( e.n )  and  secret  key  (d,  n) 
satisfy  lge=  0(  I ),  lg  d  <  /),  and  Ig  n  <  fi.  Then,  applying  a  public  key  requires 
(9(1)  modular  multiplications  and  uses  0(fJ>2)  bit  operations.  Applying  a  secret 
key  requires  O(fi)  modular  multiplications,  using  (9 ( /) 3 )  bit  operations. 

Theorem  31.36  (Correctness  of  RSA) 

The  RSA  equations  (31.37)  and  (31.38)  define  inverse  transformations  of  Z„  satis¬ 
fying  equations  (31.35)  and  (31.36). 
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Proof  From  equations  (31.37)  and  (31.38),  we  have  that  for  any  M  e  Z„, 
P(S(M))  =  S(P(M))  =  Med  (mod  n)  . 

Since  e  and  d  are  multiplicative  inverses  modulo  4>(n)  =  (p  —  1  )(q  —  1), 
ed  =  1  +  k(p  —  1  )(q  —  1) 

for  some  integer  k.  But  then,  if  M  f  0  (mod  p),  we  have 


M 


ed  _ 


(mod 

P) 

(mod 

P) 

(mod 

P)  (by 

(mod 

P)  ■ 

(mod 

p).  Thus, 

=  M(Mp-1)*(9"1) 

=  M((M  mod  py-l)k^-V 
=  M{\)k(q~l) 

=  M 

A/fed  _  „\  ;r  A/f  —  n 


Med  =  M  (mod  p) 
for  all  M .  Similarly, 

Med  =  M  (mod  q ) 

for  all  M.  Thus,  by  Corollary  31.29  to  the  Chinese  remainder  theorem, 
Med  =  M  (mod  n) 
for  all  M. 


The  security  of  the  RSA  cryptosystem  rests  in  large  pail  on  the  difficulty  of  fac¬ 
toring  large  integers.  If  an  adversary  can  factor  the  modulus  n  in  a  public  key,  then 
the  adversary  can  derive  the  secret  key  from  the  public  key,  using  the  knowledge 
of  the  factors  p  and  q  in  the  same  way  that  the  creator  of  the  public  key  used  them. 
Therefore,  if  factoring  large  integers  is  easy,  then  breaking  the  RSA  cryptosystem 
is  easy.  The  converse  statement,  that  if  factoring  large  integers  is  hard,  then  break¬ 
ing  RSA  is  hard,  is  unproven.  After  two  decades  of  research,  however,  no  easier 
method  has  been  found  to  break  the  RSA  public-key  cryptosystem  than  to  factor 
the  modulus  n.  And  as  we  shall  see  in  Section  31.9,  factoring  large  integers  is  sur¬ 
prisingly  difficult.  By  randomly  selecting  and  multiplying  together  two  1024-bit 
primes,  we  can  create  a  public  key  that  cannot  be  “broken”  in  any  feasible  amount 
of  time  with  current  technology.  In  the  absence  of  a  fundamental  breakthrough  in 
the  design  of  number-theoretic  algorithms,  and  when  implemented  with  care  fol¬ 
lowing  recommended  standards,  the  RSA  cryptosystem  is  capable  of  providing  a 
high  degree  of  security  in  applications. 

In  order  to  achieve  security  with  the  RSA  cryptosystem,  however,  we  should 
use  integers  that  are  quite  long— hundreds  or  even  more  than  one  thousand  bits 
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long— to  resist  possible  advances  in  the  art  of  factoring.  At  the  time  of  this 
writing  (2009),  RSA  moduli  were  commonly  in  the  range  of  768  to  2048  bits. 
To  create  moduli  of  such  sizes,  we  must  be  able  to  find  large  primes  efficiently. 
Section  31.8  addresses  this  problem. 

For  efficiency,  RSA  is  often  used  in  a  “hybrid”  or  “key-management”  mode 
with  fast  non-public -key  cryptosystems.  With  such  a  system,  the  encryption  and 
decryption  keys  are  identical.  If  Alice  wishes  to  send  a  long  message  M  to  Bob 
privately,  she  selects  a  random  key  K  for  the  fast  non-public-key  cryptosystem  and 
encrypts  M  using  K,  obtaining  ciphertext  C.  Here,  C  is  as  long  as  M ,  but  K 
is  quite  short.  Then,  she  encrypts  K  using  Bob’s  public  RSA  key.  Since  K  is 
short,  computing  Pb(K)  is  fast  (much  faster  than  computing  PB(M)).  She  then 
transmits  (C,  PB(K))  to  Bob,  who  decrypts  PB(K)  to  obtain  K  and  then  uses  K 
to  decrypt  C,  obtaining  M . 

We  can  use  a  similar  hybrid  approach  to  make  digital  signatures  efficiently. 
This  approach  combines  RSA  with  a  public  collision-resistant  hash  function  h— a 
function  that  is  easy  to  compute  but  for  which  it  is  computationally  infeasible  to 
find  two  messages  M  and  M'  such  that  h(M)  =  h(M').  The  value  h(M)  is 
a  short  (say,  256-bit)  “fingerprint”  of  the  message  M.  If  Alice  wishes  to  sign  a 
message  M ,  she  first  applies  h  to  M  to  obtain  the  fingerprint  h(M),  which  she 
then  encrypts  with  her  secret  key.  She  sends  (M,  SA(h(M)))  to  Bob  as  her  signed 
version  of  M.  Bob  can  verify  the  signature  by  computing  h(M)  and  verifying 
that  Pa  applied  to  .S',4 ( /? ( A/ ))  as  received  equals  h(M).  Because  no  one  can  create 
two  messages  with  the  same  fingerprint,  it  is  computationally  infeasible  to  alter  a 
signed  message  and  preserve  the  validity  of  the  signature. 

Finally,  we  note  that  the  use  of  certificates  makes  distributing  public  keys  much 
easier.  For  example,  assume  there  is  a  “trusted  authority”  T  whose  public  key 
is  known  by  everyone.  Alice  can  obtain  from  T  a  signed  message  (her  certificate) 
stating  that  “Alice’s  public  key  is  PA.”  This  certificate  is  “self-authenticating”  since 
everyone  knows  PT.  Alice  can  include  her  certificate  with  her  signed  messages, 
so  that  the  recipient  has  Alice’s  public  key  immediately  available  in  order  to  verify 
her  signature.  Because  her  key  was  signed  by  T ,  the  recipient  knows  that  Alice’s 
key  is  really  Alice’s. 

Exercises 


31.7-1 

Consider  an  RSA  key  set  with  p  =  11,  q  =  29,  n  =  319,  and  e  =  3.  What 
value  of  d  should  be  used  in  the  secret  key?  What  is  the  encryption  of  the  message 
M  =  100? 
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31.7- 2 

Prove  that  if  Alice’s  public  exponent  e  is  3  and  an  adversary  obtains  Alice’s  secret 
exponent  d,  where  0  <  d  <  (pin),  then  the  adversary  can  factor  Alice’s  modulus  n 
in  time  polynomial  in  the  number  of  bits  in  n .  (Although  you  are  not  asked  to  prove 
it,  you  may  be  interested  to  know  that  this  result  remains  true  even  if  the  condition 
e  =  3  is  removed.  See  Miller  [255].) 

31.7- 3  * 

Prove  that  RSA  is  multiplicative  in  the  sense  that 
Pa{Mx)Pa(M2)  =  Pa(M1M2)  (mod  n)  . 

Use  this  fact  to  prove  that  if  an  adversary  had  a  procedure  that  could  efficiently 
decrypt  1  percent  of  messages  from  Z„  encrypted  with  PA,  then  he  could  employ 
a  probabilistic  algorithm  to  decrypt  every  message  encrypted  with  PA  with  high 
probability. 


★  31.8  Primality  testing 

In  this  section,  we  consider  the  problem  of  finding  large  primes.  We  begin  with  a 
discussion  of  the  density  of  primes,  proceed  to  examine  a  plausible,  but  incomplete, 
approach  to  primality  testing,  and  then  present  an  effective  randomized  primality 
test  due  to  Miller  and  Rabin. 


The  density  of  prime  numbers 

For  many  applications,  such  as  cryptography,  we  need  to  find  large  “random” 
primes.  Fortunately,  large  primes  are  not  too  rare,  so  that  it  is  feasible  to  test 
random  integers  of  the  appropriate  size  until  we  find  a  prime.  The  prime  distribu¬ 
tion  function  n(n)  specifies  the  number  of  primes  that  are  less  than  or  equal  to  n. 
For  example,  n(  10)  =  4,  since  there  are  4  prime  numbers  less  than  or  equal  to  10, 
namely,  2,  3,  5,  and  7.  The  prime  number  theorem  gives  a  useful  approximation 
to  jz(n). 


Theorem  31.37  ( Prime  number  theorem ) 

7i  (n) 


lim 


n  /  In  n 


=  1  . 


The  approximation  n/ Inn  gives  reasonably  accurate  estimates  of  jt(ii)  even 
for  small  n.  For  example,  it  is  off  by  less  than  6%  at  n  =  109,  where  n{n)  = 
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50,847,534  and  n/ Inn  ss  48,254,942.  (To  a  number  theorist,  109  is  a  small  num¬ 
ber.) 

We  can  view  the  process  of  randomly  selecting  an  integer  n  and  determining 
whether  it  is  prime  as  a  Bernoulli  trial  (see  Section  C.4).  By  the  prime  number 
theorem,  the  probability  of  a  success— that  is,  the  probability  that  n  is  prime— is 
approximately  1  /  In  n .  The  geometric  distribution  tells  us  how  many  trials  we  need 
to  obtain  a  success,  and  by  equation  (C.32),  the  expected  number  of  trials  is  ap¬ 
proximately  In  n.  Thus,  we  would  expect  to  examine  approximately  In  n  integers 
chosen  randomly  near  n  in  order  to  find  a  prime  that  is  of  the  same  length  as  n. 
For  example,  we  expect  that  finding  a  1024-bit  prime  would  require  testing  ap¬ 
proximately  ln21024  ss  710  randomly  chosen  1024-bit  numbers  for  primality.  (Of 
course,  we  can  cut  this  figure  in  half  by  choosing  only  odd  integers.) 

In  the  remainder  of  this  section,  we  consider  the  problem  of  determining  whether 
or  not  a  large  odd  integer  n  is  prime.  For  notational  convenience,  we  assume  that  n 
has  the  prime  factorization 

n  =  pVP?-PerT  ,  (31-39) 

where  r  >  1 ,  plt  p2, . . . ,  pr  are  the  prime  factors  of  n,  and  e\,e2, . . .  ,er  are  posi¬ 
tive  integers.  The  integer  n  is  prime  if  and  only  if  r  =  1  and  <? ,  =  1 . 

One  simple  approach  to  the  problem  of  testing  for  primality  is  trial  division.  We 
try  dividing  n  by  each  integer  2,  3, ... ,  LV^J-  (Again,  we  may  skip  even  integers 
greater  than  2.)  It  is  easy  to  see  that  n  is  prime  if  and  only  if  none  of  the  trial  divi¬ 
sors  divides  n.  Assuming  that  each  trial  division  takes  constant  time,  the  worst-case 
running  time  is  ®(y/n),  which  is  exponential  in  the  length  of  n.  (Recall  that  if  n 
is  encoded  in  binary  using  f}  bits,  then  /l  =  [lg(«  +  1)],  and  so  *fn  =  0(2*'2).) 
Thus,  trial  division  works  well  only  if  n  is  very  small  or  happens  to  have  a  small 
prime  factor.  When  it  works,  trial  division  has  the  advantage  that  it  not  only  de¬ 
termines  whether  n  is  prime  or  composite,  but  also  determines  one  of  n’ s  prime 
factors  if  n  is  composite. 

In  this  section,  we  are  interested  only  in  finding  out  whether  a  given  number  n 
is  prime;  if  n  is  composite,  we  are  not  concerned  with  finding  its  prime  factor¬ 
ization.  As  we  shall  see  in  Section  31.9,  computing  the  prime  factorization  of  a 
number  is  computationally  expensive.  It  is  perhaps  surprising  that  it  is  much  easier 
to  tell  whether  or  not  a  given  number  is  prime  than  it  is  to  determine  the  prime 
factorization  of  the  number  if  it  is  not  prime. 

Pseudoprimality  testing 

We  now  consider  a  method  for  primality  testing  that  “almost  works”  and  in  fact 
is  good  enough  for  many  practical  applications.  Later  on,  we  shall  present  a  re- 
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finement  of  this  method  that  removes  the  small  defect.  Let  Z+  denote  the  nonzero 
elements  of  Z„ : 

Z^  =  {1,2,  ...  ,77  —  1}  . 

If  n  is  prime,  then  Z+  =  Z*. 

We  say  that  n  is  a  base-a  pseudoprime  if  n  is  composite  and 

a"-1  =  1  (mod  n)  .  (31.40) 

Feimat’s  theorem  (Theorem  31.31)  implies  that  if  n  is  prime,  then  n  satisfies  equa¬ 
tion  (31.40)  for  every  a  in  Z+.  Thus,  if  we  can  find  any  a  e  Z+  such  that  n  does 
not  satisfy  equation  (31.40),  then  n  is  certainly  composite.  Surprisingly,  the  con¬ 
verse  almost  holds,  so  that  this  criterion  forms  an  almost  perfect  test  for  primality. 
We  test  to  see  whether  n  satisfies  equation  (31.40)  for  a  —  2.  If  not,  we  declare  n 
to  be  composite  by  returning  COMPOSITE.  Otherwise,  we  return  PRIME,  guessing 
that  n  is  prime  (when,  in  fact,  all  we  know  is  that  n  is  either  prime  or  a  base-2 
pseudoprime). 

The  following  procedure  pretends  in  this  manner  to  be  checking  the  primality 
of  n .  It  uses  the  procedure  Modular-Exponentiation  from  Section  31.6.  We 
assume  that  the  input  n  is  an  odd  integer  greater  than  2. 

Pseudoprime(/j) 

1  if  MODULAR-EXPONENTIATION (2, /I  -  1  ,n)  ^  1  (mod  n) 

2  return  COMPOSITE  //  definitely 

3  else  return  PRIME  //we  hope! 

This  procedure  can  make  errors,  but  only  of  one  type.  That  is,  if  it  says  that  n 
is  composite,  then  it  is  always  correct.  If  it  says  that  n  is  prime,  however,  then  it 
makes  an  error  only  if  n  is  a  base-2  pseudoprime. 

How  often  does  this  procedure  err?  Surprisingly  rarely.  There  are  only  22  values 
of  n  less  than  10,000  for  which  it  errs;  the  first  four  such  values  are  341,  561, 
645,  and  1105.  We  won’t  prove  it,  but  the  probability  that  this  program  makes  an 
error  on  a  randomly  chosen  /I -bit  number  goes  to  zero  as  /3  — oo.  Using  more 
precise  estimates  due  to  Pomerance  [279]  of  the  number  of  base-2  pseudoprimes  of 
a  given  size,  we  may  estimate  that  a  randomly  chosen  512-bit  number  that  is  called 
prime  by  the  above  procedure  has  less  than  one  chance  in  1020  of  being  a  base-2 
pseudoprime,  and  a  randomly  chosen  1024-bit  number  that  is  called  prime  has  less 
than  one  chance  in  1041  of  being  a  base-2  pseudoprime.  So  if  you  are  merely 
trying  to  find  a  large  prime  for  some  application,  for  all  practical  purposes  you 
almost  never  go  wrong  by  choosing  large  numbers  at  random  until  one  of  them 
causes  PSEUDOPRIME  to  return  PRIME.  But  when  the  numbers  being  tested  for 
primality  are  not  randomly  chosen,  we  need  a  better  approach  for  testing  primality. 
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As  we  shall  see,  a  little  more  cleverness,  and  some  randomization,  will  yield  a 
primality-testing  routine  that  works  well  on  all  inputs. 

Unfortunately,  we  cannot  entirely  eliminate  all  the  errors  by  simply  checking 
equation  (31.40)  for  a  second  base  number,  say  a  =  3,  because  there  exist  com¬ 
posite  integers  n,  known  as  Carmichael  numbers ,  that  satisfy  equation  (31.40)  for 
all  a  €  Z*.  (We  note  that  equation  (31.40)  does  fail  when  gcd (aji)  >  1— that 
is,  when  a  $  Z*— but  hoping  to  demonstrate  that  n  is  composite  by  finding  such 
an  a  can  be  difficult  if  n  has  only  large  prime  factors.)  The  first  three  Carmichael 
numbers  are  561,  1105,  and  1729.  Carmichael  numbers  are  extremely  rare;  there 
are,  for  example,  only  255  of  them  less  than  100,000,000.  Exercise  31.8-2  helps 
explain  why  they  are  so  rare. 

We  next  show  how  to  improve  our  primality  test  so  that  it  won’t  be  fooled  by 
Carmichael  numbers. 

The  Miller-Rabin  randomized  primality  test 

The  Miller-Rabin  primality  test  overcomes  the  problems  of  the  simple  test  PSEU- 
DOPRIME  with  two  modifications: 

*  It  tries  several  randomly  chosen  base  values  a  instead  of  just  one  base  value. 

*  While  computing  each  modular  exponentiation,  it  looks  for  a  nontrivial  square 
root  of  1,  modulo  n,  during  the  final  set  of  squarings.  If  it  finds  one,  it  stops 
and  returns  COMPOSITE.  Corollary  31.35  from  Section  31.6  justifies  detecting 
composites  in  this  manner. 

The  pseudocode  for  the  Miller-Rabin  primality  test  follows.  The  input  n  >  2  is 
the  odd  number  to  be  tested  for  primality,  and  s  is  the  number  of  randomly  cho¬ 
sen  base  values  from  Z+  to  be  tried.  The  code  uses  the  random-number  generator 
Random  described  on  page  117:  Random(1,/?  —  1)  returns  a  randomly  chosen 
integer  a  satisfying  1  <  a  <  n  —  1.  The  code  uses  an  auxiliary  procedure  WITNESS 
such  that  Witness  ( a,n )  is  true  if  and  only  if  a  is  a  “witness”  to  the  composite¬ 
ness  of  n — that  is,  if  it  is  possible  using  a  to  prove  (in  a  manner  that  we  shall  see) 
that  n  is  composite.  The  test  WITNESS  (a,  n)  is  an  extension  of,  but  more  effective 
than,  the  test 

an  '  ^  1  (mod  n) 

that  formed  the  basis  (using  a  =  2)  for  PSEUDOPRIME.  We  first  present  and 
justify  the  construction  of  WITNESS,  and  then  we  shall  show  how  we  use  it  in  the 
Miller-Rabin  primality  test.  Let  n  —  1  =  2ru  where  t  >  1  and  u  is  odd;  i.e., 
the  binary  representation  of  n  —  1  is  the  binary  representation  of  the  odd  integer  u 
followed  by  exactly  t  zeros.  Therefore,  an~l  =  (au)2  (mod  n),  so  that  we  can 
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compute  an  1  mod  n  by  first  computing  au  mod  n  and  then  squaring  the  result  l 
times  successively. 

Witness  {a.  n) 

1  let  t  and  u  be  such  that  t  >  1,  u  is  odd,  and  n  —  1  =  2 lu 

2  x0  =  Modular-Exponentiation  (a,  u,n) 

3  for  i  =  1  to  t 

4  Xi  =  xf_t  mod  n 

5  if  Xi  ==  1  and  x;-_i  ^  1  and  x,_i  ^  n  -  1 

6  return  TRUE 

7  if  xt  1 

8  return  TRUE 

9  return  FALSE 

This  pseudocode  for  WITNESS  computes  an~l  mod/?  by  first  computing  the 
value  x0  =  a1'  mod  n  in  line  2  and  then  squaring  the  result  l  times  in  a  row  in  the 
for  loop  of  lines  3-6.  By  induction  on  i,  the  sequence  x0,  X\,  . . . ,  xt  of  values 
computed  satisfies  the  equation  x,  =  a2'u  (mod  /?)  for  i  =  0, 1, . . . ,  t,  so  that  in 
particular  xt  =  an~x  (mod  /?).  After  line  4  performs  a  squaring  step,  however, 
the  loop  may  terminate  early  if  lines  5-6  detect  that  a  nontrivial  square  root  of  1 
has  just  been  discovered.  (We  shall  explain  these  tests  shortly.)  If  so,  the  algo¬ 
rithm  stops  and  returns  TRUE.  Lines  7-8  return  TRUE  if  the  value  computed  for 
xt  =  a"-1  (mod  n)  is  not  equal  to  1,  just  as  the  PSEUDOPRIME  procedure  returns 
COMPOSITE  in  this  case.  Line  9  returns  FALSE  if  we  haven’t  returned  TRUE  in 
lines  6  or  8. 

We  now  argue  that  if  WlTNESS(a, n)  returns  TRUE,  then  we  can  construct  a 
proof  that  n  is  composite  using  a  as  a  witness. 

If  Witness  returns  true  from  line  8,  then  it  has  discovered  that  xt  = 
a"-1  mod  n  /  1.  If  n  is  prime,  however,  we  have  by  Lermat’s  theorem  (Theo¬ 
rem  31.31)  that  a"-1  =  1  (mod  n  )  for  all  a  e  Z+.  Therefore,  n  cannot  be  prime, 
and  the  equation  an~l  mod  n  /  1  proves  this  fact. 

If  Witness  returns  true  from  line  6,  then  it  has  discovered  that  x/_ i  is  a  non¬ 
trivial  square  root  of  1,  modulo  /?,  since  we  have  that  x,_i  ^  ±  1  (mod  n)  yet 
Xi  =  xf_l  =  1  (mod  n).  Corollary  31.35  states  that  only  if  n  is  composite  can 
there  exist  a  nontrivial  square  root  of  1  modulo  /?,  so  that  demonstrating  that  x;-_i 
is  a  nontrivial  square  root  of  1  modulo  n  proves  that  n  is  composite. 

This  completes  our  proof  of  the  correctness  of  WITNESS.  If  we  find  that  the  call 
Witness  (a,/?)  returns  true,  then  n  is  surely  composite,  and  the  witness  a,  along 
with  the  reason  that  the  procedure  returns  TRUE  (did  it  return  from  line  6  or  from 
line  8?),  provides  a  proof  that  n  is  composite. 
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At  this  point,  we  briefly  present  an  alternative  description  of  the  behavior  of 
Witness  as  a  function  of  the  sequence  X  =  (x0,  X\ , . . . ,  xt ),  which  we  shall  find 
useful  later  on,  when  we  analyze  the  efficiency  of  the  Miller-Rabin  primality  test. 
Note  that  if  x,-  =  1  for  some  0  <  i  <  t,  WITNESS  might  not  compute  the  rest 
of  the  sequence.  If  it  were  to  do  so,  however,  each  value  x1+1,  x,+2, . . . ,  xt  would 
be  1,  and  we  consider  these  positions  in  the  sequence  X  as  being  all  Is.  We  have 
four  cases: 

1.  X  =  (...,  d),  where  d  ^  1:  the  sequence  X  does  not  end  in  1.  Return  TRUE 
in  line  8;  a  is  a  witness  to  the  compositeness  of  n  (by  Fermat’s  Theorem). 

2.  X  =  (1,  1, . . . ,  1):  the  sequence  X  is  all  Is.  Return  FALSE;  a  is  not  a  witness 
to  the  compositeness  of  n . 

3.  X  =  (. . . ,  —1,  1, ....  1):  the  sequence  X  ends  in  1,  and  the  last  non-1  is  equal 
to  —1.  Return  FALSE;  a  is  not  a  witness  to  the  compositeness  of  n. 

4.  X  =  (...,  d,  1, . . . ,  1),  where  d  ^  ±1:  the  sequence  X  ends  in  1,  but  the  last 
non-1  is  not  —1.  Return  TRUE  in  line  6;  a  is  a  witness  to  the  compositeness 
of  n,  since  d  is  a  nontrivial  square  root  of  1. 

We  now  examine  the  Miller-Rabin  primality  test  based  on  the  use  of  WITNESS. 
Again,  we  assume  that  n  is  an  odd  integer  greater  than  2. 

Miller-Rabin  (n ,  5) 

1  for  j  =  1  to  s 

2  a  =  Random(1,«  -  1) 

3  if  Witness  (a,  n) 

4  return  COMPOSITE  //  definitely 

5  return  PRIME  //  almost  surely 

The  procedure  Miller-Rabin  is  a  probabilistic  search  for  a  proof  that  n  is 
composite.  The  main  loop  (beginning  on  line  1)  picks  up  to  s  random  values  of  a 
from  7L\  (line  2).  If  one  of  the  a' s  picked  is  a  witness  to  the  compositeness  of  n, 
then  Miller-Rabin  returns  composite  on  line  4.  Such  a  result  is  always  cor¬ 
rect,  by  the  correctness  of  Witness.  If  Miller-Rabin  finds  no  witness  in  s 
trials,  then  the  procedure  assumes  that  this  is  because  no  witnesses  exist,  and  there¬ 
fore  it  assumes  that  n  is  prime.  We  shall  see  that  this  result  is  likely  to  be  correct 
if  .v  is  large  enough,  but  that  there  is  still  a  tiny  chance  that  the  procedure  may  be 
unlucky  in  its  choice  of  a ’s  and  that  witnesses  do  exist  even  though  none  has  been 
found. 

To  illustrate  the  operation  of  Miller-Rabin,  let  n  be  the  Carmichael  num¬ 
ber  561,  so  that  n  —  1  =  560  =  24  ■  35,  t  =  4,  and  u  =  35.  If  the  pro¬ 
cedure  chooses  a  =  7  as  a  base,  Figure  31.4  in  Section  31.6  shows  that  WIT¬ 
NESS  computes  x0  =  a35  =  241  (mod  561)  and  thus  computes  the  sequence 
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X  =  (241,  298,  166,  67,  1).  Thus,  WITNESS  discovers  a  nontrivial  square  root 
of  1  in  the  last  squaring  step,  since  a2S0  =  67  (mod  n)  and  a560  =  1  (mod  n). 
Therefore,  a  =  7  is  a  witness  to  the  compositeness  of  n,  WITNESS  (7,  n)  returns 
true,  and  Miller-Rabin  returns  composite. 

If  n  is  a  /1-bit  number,  Miller-Rabin  requires  O (sfJ> )  arithmetic  operations 
and  0(s/33)  bit  operations,  since  it  requires  asymptotically  no  more  work  than  ,s- 
modular  exponentiations. 

Error  rate  of  the  Miller-Rabin  primality  test 

If  Miller-Rabin  returns  prime,  then  there  is  a  very  slim  chance  that  it  has  made 
an  error.  Unlike  PSEUDOPRIME,  however,  the  chance  of  error  does  not  depend 
on  n ;  there  are  no  bad  inputs  for  this  procedure.  Rather,  it  depends  on  the  size  of  s 
and  the  “luck  of  the  draw”  in  choosing  base  values  a.  Moreover,  since  each  test  is 
more  stringent  than  a  simple  check  of  equation  (31.40),  we  can  expect  on  general 
principles  that  the  error  rate  should  be  small  for  randomly  chosen  integers  n.  The 
following  theorem  presents  a  more  precise  argument. 

Theorem  31.38 

If  n  is  an  odd  composite  number,  then  the  number  of  witnesses  to  the  composite¬ 
ness  of  n  is  at  least  (n  —  l)/2. 

Proof  The  proof  shows  that  the  number  of  nonwitnesses  is  at  most  ( n  —  l)/2, 
which  implies  the  theorem. 

We  start  by  claiming  that  any  nonwitness  must  be  a  member  of  Z*.  Why? 
Consider  any  nonwitness  a.  It  must  satisfy  a"^1  =  1  (mod  n)  or,  equivalently, 
a  ■  an~2  =  1  (mod  n).  Thus,  the  equation  ax  =  1  (mod  n)  has  a  solution, 
namely  an~2.  By  Corollary  31.21,  gcd(a,n)  |  1,  which  in  turn  implies  that 
gcd(a,  /i)  =  1.  Therefore,  a  is  a  member  of  Z*;  all  nonwitnesses  belong  to  Z*. 

To  complete  the  proof,  we  show  that  not  only  are  all  nonwitnesses  contained 
in  Z* ,  they  are  all  contained  in  a  proper  subgroup  B  of  Z*  (recall  that  we  say  B 
is  a  proper  subgroup  of  Z*  when  B  is  subgroup  of  Z*  but  B  is  not  equal  to  Z*). 
By  Corollary  31.16,  we  then  have  |5|  <  |Z*|  /2.  Since  |Z*|  <  n  —  1,  we  obtain 
|  B\  <  (n  —  l)/2.  Therefore,  the  number  of  nonwitnesses  is  at  most  (n  —  l)/2,  so 
that  the  number  of  witnesses  must  be  at  least  ( n  —  l)/2. 

We  now  show  how  to  find  a  proper  subgroup  B  of  Z*  containing  all  of  the 
nonwitnesses.  We  break  the  proof  into  two  cases. 

Case  1:  There  exists  an  x  e  Z*  such  that 
xn~i  ^  i  (mod  77 )  . 


972 


Chapter  31  Number  Theoretic  Algorithms 


In  other  words,  n  is  not  a  Carmichael  number.  Because,  as  we  noted  earlier, 
Carmichael  numbers  are  extremely  rare,  case  1  is  the  main  case  that  arises  “in 
practice”  (e.g.,  when  n  has  been  chosen  randomly  and  is  being  tested  for  primal- 
ity). 

Let  B  =  {b  e  Z*  :  bn~x  =  1  (mod  n)}.  Clearly,  B  is  nonempty,  since  1  e  B. 
Since  B  is  closed  under  multiplication  modulo  n,  we  have  that  B  is  a  subgroup 
of  Z*  by  Theorem  31.14.  Note  that  every  nonwitness  belongs  to  B,  since  a  non¬ 
witness  a  satisfies  an~l  =  1  (mod  n).  Since  x  e  Z*  —  B,  we  have  that  B  is  a 
proper  subgroup  of  Z* . 

Case  2:  For  all  x  €  Z*, 

x"-1  =  1  (mod  n )  .  (31.41) 

In  other  words,  n  is  a  Carmichael  number.  This  case  is  extremely  rare  in  prac¬ 
tice.  However,  the  Miller-Rabin  test  (unlike  a  pseudo-primality  test)  can  efficiently 
determine  that  Carmichael  numbers  are  composite,  as  we  now  show. 

In  this  case,  n  cannot  be  a  prime  power.  To  see  why,  let  us  suppose  to  the 
contrary  that  n  =  pe ,  where  p  is  a  prime  and  e  >  1.  We  derive  a  contradiction 
as  follows.  Since  we  assume  that  n  is  odd,  p  must  also  be  odd.  Theorem  31.32 
implies  that  Z*  is  a  cyclic  group:  it  contains  a  generator  g  such  that  ord„(g)  = 
|Z*|  =  </>(«)  =  pe(  1  —  1  / p)  —  (p  —  1  )pe~l .  (The  formula  for  4>(n)  comes  from 
equation  (31.20).)  By  equation  (31.41),  we  have  g"_1  =  1  (mod  n).  Then  the 
discrete  logarithm  theorem  (Theorem  31.33,  taking  y  =  0)  implies  that  n  —  1  =  0 
(mod  4>{n)),  or 

(p-l)pe~l\pe-l. 

This  is  a  contradiction  for  e  >  1,  since  (p  —  1  )pe~l  is  divisible  by  the  prime  p 
but  pe  —  1  is  not.  Thus,  n  is  not  a  prime  power. 

Since  the  odd  composite  number  n  is  not  a  prime  power,  we  decompose  it  into 
a  product  i /? 2 ,  where  «i  and  n2  are  odd  numbers  greater  than  1  that  are  relatively 
prime  to  each  other.  (There  may  be  several  ways  to  decompose  n,  and  it  does  not 
matter  which  one  we  choose.  For  example,  if  n  =  p\l  p(22  ■  ■  ■  pe/ ,  then  we  can 
choose  72 1  =  p\l  and  n2  =  pe2  pT  ' ' '  PV •) 

Recall  that  we  define  t  and  u  so  that  n  —  1  =2 {u,  where  t  >  I  and  u  is  odd,  and 
that  for  an  input  a,  the  procedure  WITNESS  computes  the  sequence 

X  =  ( au,a2u,a22u,...,a2‘u ) 

(all  computations  are  performed  modulo  n). 

Let  us  call  a  pair  (v,  j)  of  integers  acceptable  if  v  €  Z*,  j  e  {0,  1 J  ,  and 

vVu  =  —  1  (mod  72)  . 
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Acceptable  pairs  certainly  exist  since  u  is  odd;  we  can  choose  v  =  n  —  1  and 
j  =  0,  so  that  (n  —  1 , 0)  is  an  acceptable  pair.  Now  pick  the  largest  possible  j  such 
that  there  exists  an  acceptable  pair  (u,  j),  and  fix  v  so  that  (v,  j )  is  an  acceptable 
pair.  Let 

B  =  {x  e  Z*  :  x2Ju  =  ±1  (mod  n)}  . 

Since  B  is  closed  under  multiplication  modulo  n,  it  is  a  subgroup  of  Z*.  By  Theo¬ 
rem  31.15,  therefore,  |  B\  divides  |Z*|.  Every  nonwitness  must  be  a  member  of  B, 
since  the  sequence  X  produced  by  a  nonwitness  must  either  be  all  Is  or  else  contain 
a  —  1  no  later  than  the  j th  position,  by  the  maximality  of  j .  (If  ( a ,  j')  is  acceptable, 
where  a  is  a  nonwitness,  we  must  have  j'  <  j  by  how  we  chose  j .) 

We  now  use  the  existence  of  v  to  demonstrate  that  there  exists  a  w  e  Z*  —  B, 
and  hence  that  B  is  a  proper  subgroup  of  Z*.  Since  v2lu  =  —1  (mod  n),  we  have 
v2'u  =  — 1  (mod  H  i )  by  Corollary  31.29  to  the  Chinese  remainder  theorem.  By 
Corollary  31.28,  there  exists  a  w  simultaneously  satisfying  the  equations 

w  =  v  (mod  iii)  , 
w  =  1  (mod  it 2)  ■ 

Therefore, 

w2Ju  =  —1  (mod  «i)  , 

w2Ju  =  1  (mod  n2)  . 

By  Corollary  31.29,  w2Ju  ^  1  (mod  «x)  implies  w2'u  ^  1  (mod  n ),  and 
w Vu  ^  —1  (mod  n2)  implies  w2Ju  ^  —  1  (mod  n).  Hence,  we  conclude  that 
w2Ju  ^  (mod  n),  and  so  w  $  B. 

It  remains  to  show  that  we  Z* ,  which  we  do  by  first  working  separately  mod¬ 
ulo  n  1  and  modulo  n2.  Working  modulo  n  1,  we  observe  that  since  v  e  Z*,  we 
have  that  gcd(v,/j)  =  1,  and  so  also  gcd(v,/7!)  =  1;  if  v  does  not  have  any  com¬ 
mon  divisors  with  n,  then  it  certainly  does  not  have  any  common  divisors  with  nx. 
Since  w  =  v  (mod  «,),  we  see  that  gcd(u>,/?i)  =  1.  Working  modulo  n2,  we 
observe  that  w  =  1  (mod  n2)  implies  gcd(u>,/j2)  =  1-  To  combine  these  results, 
we  use  Theorem  31.6,  which  implies  that  gcd(u>,  «i/72)  =  gcd(w,  n)  =  1.  That  is, 
w  e  Z„*. 

Therefore  w  e  Z*  —  B,  and  we  finish  case  2  with  the  conclusion  that  B  is  a 
proper  subgroup  of  Z* . 

In  either  case,  we  see  that  the  number  of  witnesses  to  the  compositeness  of  n  is 
at  least  (77  —  l)/2.  ■ 

Theorem  31.39 

For  any  odd  integer  n  >  2  and  positive  integer  s,  the  probability  that  Miller- 
Rabin(77,  s)  errs  is  at  most  2~L 
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Proof  Using  Theorem  3 1 .38,  we  see  that  if  n  is  composite,  then  each  execution  of 
the  for  loop  of  lines  1-4  has  a  probability  of  at  least  1/2  of  discovering  a  witness  x 
to  the  compositeness  of  n.  Miller-Rabin  makes  an  error  only  if  it  is  so  unlucky 
as  to  miss  discovering  a  witness  to  the  compositeness  of  n  on  each  of  the  s  iterations 
of  the  main  loop.  The  probability  of  such  a  sequence  of  misses  is  at  most  2~s .  m 


If  n  is  prime,  Miller-Rabin  always  reports  Prime,  and  if  n  is  composite,  the 
chance  that  Miller-Rabin  reports  Prime  is  at  most  2~s. 

When  applying  Miller-Rabin  to  a  large  randomly  chosen  integer  n,  however, 
we  need  to  consider  as  well  the  prior  probability  that  n  is  prime,  in  order  to  cor¬ 
rectly  interpret  Miller-Rabin’s  result.  Suppose  that  we  fix  a  bit  length  /I  and 
choose  at  random  an  integer  n  of  length  fJ>  bits  to  be  tested  for  primality.  Let  A 
denote  the  event  that  n  is  prime.  By  the  prime  number  theorem  (Theorem  31.37), 
the  probability  that  n  is  prime  is  approximately 

Pr{^4}  1/ln  n 

ss  1.443//3  . 


Now  let  B  denote  the  event  that  Miller-Rabin  returns  Prime.  We  have  that 
Pr  {5  |  A]  =0  (or  equivalently,  that  Pr  { B  |  4}  =  I)  and  Pr  { B  \  A }  <  2~s  (or 
equivalently,  that  Pr  {B  \  Aj  >  1  —  2~s). 

But  what  is  Pr  j A  \  B},  the  probability  that  n  is  prime,  given  that  Miller- 
Rabin  has  returned  Prime?  By  the  alternate  form  of  Bayes’s  theorem  (equa¬ 
tion  (C.18))  we  have 


Vr{A  |  B)  = 


Pr  {^4}  Pr  {5  |  A} 

Pr  {^4}Pr{5  |  A}  +  Pr  {1}  Pr  {B  \  A} 

1 

1  +  2~s(ln»  —  1) 


This  probability  does  not  exceed  1/2  until  s  exceeds  lgflnn  —  1).  Intuitively,  that 
many  initial  trials  are  needed  just  for  the  confidence  derived  from  failing  to  find  a 
witness  to  the  compositeness  of  n  to  overcome  the  prior  bias  in  favor  of  n  being 
composite.  For  a  number  with  fJ>  =  1024  bits,  this  initial  testing  requires  about 


lg(ln/i—  1)  ss  lg(j6/1.443) 
%  9 


trials.  In  any  case,  choosing  s  =  50  should  suffice  for  almost  any  imaginable 
application. 

In  fact,  the  situation  is  much  better.  If  we  are  trying  to  find  large  primes  by 
applying  Miller-Rabin  to  large  randomly  chosen  odd  integers,  then  choosing 
a  small  value  of  s  (say  3)  is  very  unlikely  to  lead  to  erroneous  results,  though 
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we  won’t  prove  it  here.  The  reason  is  that  for  a  randomly  chosen  odd  composite 
integer  n ,  the  expected  number  of  nonwitnesses  to  the  compositeness  of  n  is  likely 
to  be  very  much  smaller  than  (n  —  l)/2. 

If  the  integer  n  is  not  chosen  randomly,  however,  the  best  that  can  be  proven  is 
that  the  number  of  nonwitnesses  is  at  most  (n  —  1  )/4,  using  an  improved  version 
of  Theorem  31.38.  Furthermore,  there  do  exist  integers  n  for  which  the  number  of 
nonwitnesses  is  ( n  —  l)/4. 

Exercises 


31.8- 1 

Prove  that  if  an  odd  integer  n  >  1  is  not  a  prime  or  a  prime  power,  then  there  exists 
a  nontrivial  square  root  of  1  modulo  n. 

31.8- 2  * 

It  is  possible  to  strengthen  Euler’s  theorem  slightly  to  the  form 

aX(n)  =  1  (mod  73)  for  all  a  e  Z*  , 

where  n  —  p\l  ■  ■  ■  perr  and  A (73)  is  defined  by 

A(73)  =  lcm  (c/tipl1 (p(pe/))  .  (31.42) 

Prove  that  A (n)  \  <p(n).  A  composite  number  73  is  a  Carmichael  number  if 
A  (73)  |  73  —  1.  The  smallest  Carmichael  number  is  561  =  3-11-17;  here, 
A (73)  =  lcm(2.  10,  16)  =  80,  which  divides  560.  Prove  that  Carmichael  num¬ 
bers  must  be  both  “square-free”  (not  divisible  by  the  square  of  any  prime)  and  the 
product  of  at  least  three  primes.  (For  this  reason,  they  are  not  very  common.) 


31.8-3 

Prove  that  if  x  is  a  nontrivial  square  root  of  1,  modulo  73,  then  gcd(x  —  1, 73)  and 
gcd(x  +  1, 73)  are  both  nontrivial  divisors  of  73. 


★  31.9  Integer  factorization 

Suppose  we  have  an  integer  n  that  we  wish  to  factor,  that  is,  to  decompose  into  a 
product  of  primes.  The  primality  test  of  the  preceding  section  may  tell  us  that  73  is 
composite,  but  it  does  not  tell  us  the  prime  factors  of  73.  Factoring  a  large  integer  n 
seems  to  be  much  more  difficult  than  simply  determining  whether  73  is  prime  or 
composite.  Even  with  today’s  supercomputers  and  the  best  algorithms  to  date,  we 
cannot  feasibly  factor  an  arbitrary  1024-bit  number. 
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Pollard’s  rho  heuristic 

Trial  division  by  all  integers  up  to  R  is  guaranteed  to  factor  completely  any  number 
up  to  R2.  For  the  same  amount  of  work,  the  following  procedure,  Pollard-Rho, 
factors  any  number  up  to  R4  (unless  we  are  unlucky).  Since  the  procedure  is  only 
a  heuristic,  neither  its  running  time  nor  its  success  is  guaranteed,  although  the 
procedure  is  highly  effective  in  practice.  Another  advantage  of  the  POLLARD- 
Rho  procedure  is  that  it  uses  only  a  constant  number  of  memory  locations.  (If  you 
wanted  to,  you  could  easily  implement  Pollard-Rho  on  a  programmable  pocket 
calculator  to  find  factors  of  small  numbers.) 

Pollard-Rho(/j) 

1  i  =  l 

2  xi  =  Random(0,  n  -  1) 

3  y  =  x  i 

4  k  =  2 

5  while  true 

6  i=i+l 

7  X/  =  (x2_1  —  1)  mod  n 

8  d  =  gcd(y  —  x, ,  n ) 

9  if  d  /  1  and  d  ^  n 

10  print  d 

11  if  i  ==  k 

12  y  =  Xi 

13  k  =  2k 

The  procedure  works  as  follows.  Lines  1-2  initialize  i  to  1  and  x,  to  a  randomly 
chosen  value  in  Z„ .  The  while  loop  beginning  on  line  5  iterates  forever,  searching 
for  factors  of  n.  During  each  iteration  of  the  while  loop,  line  7  uses  the  recurrence 

Xi  =  (xf_ |  —  1)  mod  n  (31.43) 

to  produce  the  next  value  of  x,  in  the  infinite  sequence 

Xj ,  x2,  x3,  x4, . . .  ,  (31.44) 

with  line  6  correspondingly  incrementing  i .  The  pseudocode  is  written  using  sub¬ 
scripted  variables  x,  for  clarity,  but  the  program  works  the  same  if  all  of  the  sub¬ 
scripts  are  dropped,  since  only  the  most  recent  value  of  x,  needs  to  be  maintained. 
With  this  modification,  the  procedure  uses  only  a  constant  number  of  memory  lo¬ 
cations. 

Every  so  often,  the  program  saves  the  most  recently  generated  x,  value  in  the 
variable  y.  Specifically,  the  values  that  are  saved  are  the  ones  whose  subscripts  are 
powers  of  2: 
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*1,  *2,  *4,  *8,  *16.  •  •  •  • 

Line  3  saves  the  value  xl5  and  line  12  saves  xg  whenever  i  is  equal  to  k.  The 
variable  k  is  initialized  to  2  in  line  4,  and  line  13  doubles  it  whenever  line  12 
updates  y.  Therefore,  k  follows  the  sequence  1, 2,  4,  8, . . .  and  always  gives  the 
subscript  of  the  next  value  Xg  to  be  saved  in  y. 

Lines  8-10  try  to  find  a  factor  of  n,  using  the  saved  value  of  y  and  the  cur¬ 
rent  value  of  x,.  Specifically,  line  8  computes  the  greatest  common  divisor 
d  =  gcd(y  —  Xj.n).  If  line  9  finds  d  to  be  a  nontrivial  divisor  of  n,  then  line  10 
prints  d. 

This  procedure  for  finding  a  factor  may  seem  somewhat  mysterious  at  first. 
Note,  however,  that  Pollard-Rho  never  prints  an  incorrect  answer;  any  num¬ 
ber  it  prints  is  a  nontrivial  divisor  of  n.  Pollard-Rho  might  not  print  anything 
at  all,  though;  it  comes  with  no  guarantee  that  it  will  print  any  divisors.  We  shall 
see,  however,  that  we  have  good  reason  to  expect  Pollard-Rho  to  print  a  fac¬ 
tor  p  of  n  after  iterations  of  the  while  loop.  Thus,  if  n  is  composite,  we 

can  expect  this  procedure  to  discover  enough  divisors  to  factor  n  completely  after 
approximately  n  '^4  updates,  since  every  prime  factor  p  of  n  except  possibly  the 
largest  one  is  less  than  yfn. 

We  begin  our  analysis  of  how  this  procedure  behaves  by  studying  how  long 
it  takes  a  random  sequence  modulo  n  to  repeat  a  value.  Since  Z„  is  finite,  and 
since  each  value  in  the  sequence  (31.44)  depends  only  on  the  previous  value,  the 
sequence  (3 1 .44)  eventually  repeats  itself.  Once  we  reach  an  x,  such  that  x,  =  Xj 
for  some  j  <  i,  we  are  in  a  cycle,  since  x,-+ 1  =  Xj+ 1,  x,+2  =  X/+2,  and  so  on. 
The  reason  for  the  name  “rho  heuristic”  is  that,  as  Figure  31.7  shows,  we  can  draw 
the  sequence  Xi,  x2, . . . ,  x7_i  as  the  “tail”  of  the  rho  and  the  cycle  Xj ,  Xj+\ , . . . ,  x, 
as  the  “body”  of  the  rho. 

Let  us  consider  the  question  of  how  long  it  takes  for  the  sequence  of  x,  to  repeat. 
This  information  is  not  exactly  what  we  need,  but  we  shall  see  later  how  to  modify 
the  argument.  For  the  purpose  of  this  estimation,  let  us  assume  that  the  function 

/«(x)  =  (x2  —  1)  mod  n 

behaves  like  a  “random”  function.  Of  course,  it  is  not  really  random,  but  this  as¬ 
sumption  yields  results  consistent  with  the  observed  behavior  of  Pollard-Rho. 
We  can  then  consider  each  x,  to  have  been  independently  drawn  from  Z„  according 
to  a  uniform  distribution  on  Z„.  By  the  birthday-paradox  analysis  of  Section  5.4.1, 
we  expect  &(y/n)  steps  to  be  taken  before  the  sequence  cycles. 

Now  for  the  required  modification.  Let  p  be  a  nontrivial  factor  of  n  such  that 
gcd( p.  n/ p)  =  1.  For  example,  if  n  has  the  factorization  n  =  p\l  p^  ■  ■  ■  perr ,  then 
we  may  take  p  to  be  /q 1 .  (If  e\  =  1,  then  p  is  just  the  smallest  prime  factor  of  n, 
a  good  example  to  keep  in  mind.) 
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Figure  31.7  Pollard’s  rho  heuristic,  (a)  The  values  produced  by  the  recurrence  x,+i  = 
(x?  —  1)  mod  1387,  starting  with  x\  =  2.  The  prime  factorization  of  1387  is  19  •  73.  The  heavy 
arrows  indicate  the  iteration  steps  that  are  executed  before  the  factor  19  is  discovered.  The  light 
arrows  point  to  unreached  values  in  the  iteration,  to  illustrate  the  “rho”  shape.  The  shaded  values  are 
the  y  values  stored  by  Pollard  Rho.  The  factor  19  is  discovered  upon  reaching  x7  =  177,  when 
gcd(63  —  177, 1387)  =  19  is  computed.  The  first  x  value  that  would  be  repeated  is  1186,  but  the 
factor  19  is  discovered  before  this  value  is  repeated,  (b)  The  values  produced  by  the  same  recurrence, 
modulo  19.  Every  value  x;  given  in  part  (a)  is  equivalent,  modulo  19,  to  the  value  xj  shown  here. 
For  example,  both  .*4  =  63  and  xj  =  177  are  equivalent  to  6,  modulo  19.  (c)  The  values  produced 
by  the  same  recurrence,  modulo  73.  Every  value  x,  given  in  part  (a)  is  equivalent,  modulo  73,  to  the 
value  x”  shown  here.  By  the  Chinese  remainder  theorem,  each  node  in  part  (a)  corresponds  to  a  pair 
of  nodes,  one  from  part  (b)  and  one  from  part  (c). 

The  sequence  {xj)  induces  a  corresponding  sequence  (x')  modulo  p,  where 
x'  =  Xj  mod  p 
for  all  1 . 

Furthermore,  because  f„  is  defined  using  only  arithmetic  operations  (squaring 
and  subtraction)  modulo  n,  we  can  compute  x'+1  from  x';  the  “modulo  p”  view  of 
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the  sequence  is  a  smaller  version  of  what  is  happening  modulo  n : 
x'+1  =  Xi+ 1  mod  p 

=  fn(Xi)  mod  p 
=  ((xf  —  1)  mod  n)  mod  p 
—  (x2  —  1)  mod  p  (by  Exercise  31.1-7) 

=  ( (Xj  mod  p)2  —  1)  mod  p 

=  ((*;)2  -  1)  mod  p 

=  fpU'i)  ■ 

Thus,  although  we  are  not  explicitly  computing  the  sequence  (x ' ) ,  this  sequence  is 
well  defined  and  obeys  the  same  recurrence  as  the  sequence  (x,). 

Reasoning  as  before,  we  find  that  the  expected  number  of  steps  before  the  se¬ 
quence  (xj)  repeats  is  0(^/p).  If  p  is  small  compared  to  n,  the  sequence  (x-)  might 
repeat  much  more  quickly  than  the  sequence  (x, ) .  Indeed,  as  parts  (b)  and  (c)  of 
Figure  31.7  show,  the  (xj)  sequence  repeats  as  soon  as  two  elements  of  the  se¬ 
quence  (x;  )  are  merely  equivalent  modulo  p,  rather  than  equivalent  modulo  n. 

Let  t  denote  the  index  of  the  first  repeated  value  in  the  (x-)  sequence,  and  let 
u  >  0  denote  the  length  of  the  cycle  that  has  been  thereby  produced.  That  is,  t 
and  u  >  0  are  the  smallest  values  such  that  x't+i  =  x't+u+i  for  all  i  >  0.  By  the 
above  arguments,  the  expected  values  of  t  and  it  are  both  ©(^/p).  Note  that  if 
x't+i  -  x't+u+i,  then  p  |  (xt+u+i  -  xt+i).  Thus,  gcd{xt+u+i  -  xt+i,n)  >  1. 

Therefore,  once  Pollard-Rho  has  saved  as  y  any  value  Xg  such  that  k  >  t, 
then  y  mod  p  is  always  on  the  cycle  modulo  p.  (If  a  new  value  is  saved  as  y, 
that  value  is  also  on  the  cycle  modulo  p.)  Eventually,  k  is  set  to  a  value  that 
is  greater  than  u,  and  the  procedure  then  makes  an  entire  loop  around  the  cycle 
modulo  p  without  changing  the  value  of  y.  The  procedure  then  discovers  a  factor 
of  n  when  x,-  “runs  into”  the  previously  stored  value  of  y,  modulo  p,  that  is,  when 
x,  =  y  (mod  p). 

Presumably,  the  factor  found  is  the  factor  p,  although  it  may  occasionally  hap¬ 
pen  that  a  multiple  of  p  is  discovered.  Since  the  expected  values  of  both  t  and  u  are 
0(  v/p),  the  expected  number  of  steps  required  to  produce  the  factor  p  is  ©(^/p). 

This  algorithm  might  not  perform  quite  as  expected,  for  two  reasons.  First,  the 
heuristic  analysis  of  the  running  time  is  not  rigorous,  and  it  is  possible  that  the  cycle 
of  values,  modulo  p,  could  be  much  larger  than  ^fp.  In  this  case,  the  algorithm 
performs  correctly  but  much  more  slowly  than  desired.  In  practice,  this  issue  seems 
to  be  moot.  Second,  the  divisors  of  n  produced  by  this  algorithm  might  always  be 
one  of  the  trivial  factors  1  or  n.  For  example,  suppose  that  n  =  pq ,  where  p 
and  q  are  prime.  It  can  happen  that  the  values  of  t  and  u  for  p  are  identical  with 
the  values  of  t  and  u  for  q,  and  thus  the  factor  p  is  always  revealed  in  the  same 
gcd  operation  that  reveals  the  factor  q.  Since  both  factors  are  revealed  at  the  same 


980 


Chapter  31  Number  Theoretic  Algorithms 


time,  the  trivial  factor  pq  =  n  is  revealed,  which  is  useless.  Again,  this  problem 
seems  to  be  insignificant  in  practice.  If  necessary,  we  can  restart  the  heuristic  with 
a  different  recurrence  of  the  form  xi+i  =  (xf  —  c)  mod  n.  (We  should  avoid  the 
values  c  =  0  and  c  =  2  for  reasons  we  will  not  go  into  here,  but  other  values  are 
fine.) 

Of  course,  this  analysis  is  heuristic  and  not  rigorous,  since  the  recurrence  is 
not  really  “random.”  Nonetheless,  the  procedure  performs  well  in  practice,  and 
it  seems  to  be  as  efficient  as  this  heuristic  analysis  indicates.  It  is  the  method  of 
choice  for  finding  small  prime  factors  of  a  large  number.  To  factor  a  /I -bit  compos¬ 
ite  number  n  completely,  we  only  need  to  find  all  prime  factors  less  than  [n  l/2J , 
and  so  we  expect  POLLARD-Rho  to  require  at  most  n 1  /  4  =  2^/4  arithmetic  opera¬ 
tions  and  at  most  n1/4/?2  =  (I2  bit  operations.  Pollard-Rho’s  ability  to  find 

a  small  factor  p  of  n  with  an  expected  number  ®(«/p)  of  arithmetic  operations  is 
often  its  most  appealing  feature. 

Exercises 


31.9-1 

Referring  to  the  execution  history  shown  in  Figure  31.7(a),  when  does  POLLARD- 
Rho  print  the  factor  73  of  1387? 


31.9-2 

Suppose  that  we  are  given  a  function  /  :  Z„  — »•  Z„  and  an  initial  value  x0  e  Z„. 
Define  x,-  =  f(Xi-{)  for  i  =  1,2,.. ..  Let  t  and  u  >  0  be  the  smallest  values  such 
that  xt+i  =  xt+u+i  for  i  =  0, 1, . . ..  In  the  terminology  of  Pollard’s  rho  algorithm, 
t  is  the  length  of  the  tail  and  u  is  the  length  of  the  cycle  of  the  rho.  Give  an  efficient 
algorithm  to  determine  t  and  u  exactly,  and  analyze  its  running  time. 


31.9- 3 

How  many  steps  would  you  expect  Pollard-Rho  to  require  to  discover  a  factor 
of  the  form  pe ,  where  p  is  prime  and  e  >  1  ? 

31.9- 4  * 

One  disadvantage  of  Pollard-Rho  as  written  is  that  it  requires  one  gcd  compu¬ 
tation  for  each  step  of  the  recurrence.  Instead,  we  could  batch  the  gcd  computa¬ 
tions  by  accumulating  the  product  of  several  x,  values  in  a  row  and  then  using  this 
product  instead  of  x,  in  the  gcd  computation.  Describe  carefully  how  you  would 
implement  this  idea,  why  it  works,  and  what  batch  size  you  would  pick  as  the  most 
effective  when  working  on  a  /3-bit  number  n. 
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Problems 


31-1  Binary  gcd  algorithm 

Most  computers  can  perform  the  operations  of  subtraction,  testing  the  parity  (odd 
or  even)  of  a  binary  integer,  and  halving  more  quickly  than  computing  remainders. 
This  problem  investigates  the  binary  gcd  algorithm,  which  avoids  the  remainder 
computations  used  in  Euclid’s  algorithm. 

a.  Prove  that  if  a  and  b  are  both  even,  then  gcd (a,b)  =  2  ■  gcd{a/2,b/2). 

b.  Prove  that  if  a  is  odd  and  b  is  even,  then  gcdfa.  b)  =  gcd (a,  b/2). 

c.  Prove  that  if  a  and  b  are  both  odd,  then  gcd(n,  b)  =  gcd((a  —  b)/2,  b). 

d.  Design  an  efficient  binary  gcd  algorithm  for  input  integers  a  and  b,  where 
a  >  b,  that  runs  in  0(\ga)  time.  Assume  that  each  subtraction,  parity  test, 
and  halving  takes  unit  time. 

31-2  Analysis  of  bit  operations  in  Euclid’s  algorithm 

a.  Consider  the  ordinary  “paper  and  pencil”  algorithm  for  long  division:  dividing 
a  by  b,  which  yields  a  quotient  q  and  remainder  r.  Show  that  this  method 
requires  0((  1  +  Ig  q)  \gb)  bit  operations. 

b.  Define  /i(a,b)  =  (1  +  \ga)(  \  +  lg  b).  Show  that  the  number  of  bit  operations 
performed  by  Euclid  in  reducing  the  problem  of  computing  gcd(a,  b)  to  that 
of  computing  gcd(b,  a  mod  b)  is  at  most  c(fi(a,b)  —  gL{b,  a  mod  b))  for  some 
sufficiently  large  constant  c  >  0. 

c.  Show  that  EL'CUDffi.  b)  requires  0(fi(a,b))  bit  operations  in  general  and 
0(/32)  bit  operations  when  applied  to  two  /1-bit  inputs. 

31-3  Three  algorithms  for  Fibonacci  numbers 

This  problem  compares  the  efficiency  of  three  methods  for  computing  the  nth  Fi¬ 
bonacci  number  Fn,  given  n.  Assume  that  the  cost  of  adding,  subtracting,  or  mul¬ 
tiplying  two  numbers  is  0(1),  independent  of  the  size  of  the  numbers. 

a.  Show  that  the  running  time  of  the  straightforward  recursive  method  for  com¬ 
puting  Fn  based  on  recurrence  (3.22)  is  exponential  in  n.  (See,  for  example,  the 
Fib  procedure  on  page  775.) 

b.  Show  how  to  compute  Fn  in  0(n)  time  using  memoization. 
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c.  Show  how  to  compute  Fn  in  0( lg  n)  time  using  only  integer  addition  and  mul¬ 
tiplication.  {Hint:  Consider  the  matrix 


and  its  powers.) 

d.  Assume  now  that  adding  two  /3  -bit  numbers  takes  0(/3)  time  and  that  multi¬ 
plying  two  /3-bit  numbers  takes  0(/32)  time.  What  is  the  running  time  of  these 
three  methods  under  this  more  reasonable  cost  measure  for  the  elementary  arith¬ 
metic  operations? 

31  -4  Quadratic  residues 

Let  p  be  an  odd  prime.  A  number  a  €  Z*  is  a  quadratic  residue  if  the  equation 

x2  =  a  (mod  p)  has  a  solution  for  the  unknown  x. 

a.  Show  that  there  are  exactly  (p  —  l)/2  quadratic  residues,  modulo  p. 

b.  If  p  is  prime,  we  define  the  Legendre  symbol  (^),  for  a  e  Z*,  to  be  1  if  a  is  a 
quadratic  residue  modulo  p  and  —1  otherwise.  Prove  that  if  a  eZ*,  then 


(mod  p)  . 


Give  an  efficient  algorithm  that  determines  whether  a  given  number  a  is  a  qua¬ 
dratic  residue  modulo  p.  Analyze  the  efficiency  of  your  algorithm. 

c.  Prove  that  if  p  is  a  prime  of  the  form  4/c  +  3  and  a  is  a  quadratic  residue  in  Z*, 
then  ak+x  mod  p  is  a  square  root  of  a,  modulo  p.  How  much  time  is  required 
to  find  the  square  root  of  a  quadratic  residue  a  modulo  pi 

d.  Describe  an  efficient  randomized  algorithm  for  finding  a  nonquadratic  residue, 
modulo  an  arbitrary  prime  p,  that  is,  a  member  of  Z*  that  is  not  a  quadratic 
residue.  How  many  arithmetic  operations  does  your  algorithm  require  on  aver¬ 
age? 
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Niven  and  Zuckerman  [265]  provide  an  excellent  introduction  to  elementary  num¬ 
ber  theory.  Knuth  [210]  contains  a  good  discussion  of  algorithms  for  finding  the 
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greatest  common  divisor,  as  well  as  other  basic  number-theoretic  algorithms.  Bach 
[30]  and  Riesel  [295]  provide  more  recent  surveys  of  computational  number  the¬ 
ory.  Dixon  [91]  gives  an  overview  of  factorization  and  primality  testing.  The 
conference  proceedings  edited  by  Pomerance  [280]  contains  several  excellent  sur¬ 
vey  articles.  More  recently,  Bach  and  Shallit  [31]  have  provided  an  exceptional 
overview  of  the  basics  of  computational  number  theory. 

Knuth  [210]  discusses  the  origin  of  Euclid’s  algorithm.  It  appeal's  in  Book  7, 
Propositions  1  and  2,  of  the  Greek  mathematician  Euclid’s  Elements ,  which  was 
written  around  300  B.C.  Euclid’s  description  may  have  been  derived  from  an  al¬ 
gorithm  due  to  Eudoxus  around  375  B.C.  Euclid’s  algorithm  may  hold  the  honor 
of  being  the  oldest  nontrivial  algorithm;  it  is  rivaled  only  by  an  algorithm  for  mul¬ 
tiplication  known  to  the  ancient  Egyptians.  Shallit  [312]  chronicles  the  history  of 
the  analysis  of  Euclid’s  algorithm. 

Knuth  attributes  a  special  case  of  the  Chinese  remainder  theorem  (Theo¬ 
rem  31.27)  to  the  Chinese  mathematician  Sun-Tsu,  who  lived  sometime  between 
200  B.C.  and  A.D.  200— the  date  is  quite  uncertain.  The  same  special  case  was 
given  by  the  Greek  mathematician  Nichomachus  around  A.D.  100.  It  was  gener¬ 
alized  by  Chhin  Chiu-Shao  in  1247.  The  Chinese  remainder  theorem  was  finally 
stated  and  proved  in  its  full  generality  by  L.  Euler  in  1734. 

The  randomized  primality-testing  algorithm  presented  here  is  due  to  Miller  [255] 
and  Rabin  [289];  it  is  the  fastest  randomized  primality-testing  algorithm  known, 
to  within  constant  factors.  The  proof  of  Theorem  31.39  is  a  slight  adaptation  of 
one  suggested  by  Bach  [29].  A  proof  of  a  stronger  result  for  Miller-Rabin 
was  given  by  Monier  [258,  259].  For  many  years  primality-testing  was  the  classic 
example  of  a  problem  where  randomization  appeared  to  be  necessary  to  obtain 
an  efficient  (polynomial-time)  algorithm.  In  2002,  however,  Agrawal,  Kayal,  and 
Saxema  [4]  suiprised  everyone  with  their  deterministic  polynomial-time  primality- 
testing  algorithm.  Until  then,  the  fastest  deterministic  primality  testing  algorithm 
known,  due  to  Cohen  and  Lenstra  [73],  ran  in  time  (lg  «)°(lgiglg")  on  input  n,  which 
is  just  slightly  superpolynomial.  Nonetheless,  for  practical  purposes  randomized 
primality-testing  algorithms  remain  more  efficient  and  are  preferred. 

The  problem  of  finding  large  “random”  primes  is  nicely  discussed  in  an  article 
by  Beauchemin,  Brassard,  Crepeau,  Goutier,  and  Pomerance  [36]. 

The  concept  of  a  public -key  cryptosystem  is  due  to  Diffie  and  Heilman  [87]. 
The  RSA  cryptosystem  was  proposed  in  1977  by  Rivest,  Shamir,  and  Adleman 
[296].  Since  then,  the  field  of  cryptography  has  blossomed.  Our  understanding 
of  the  RSA  cryptosystem  has  deepened,  and  modern  implementations  use  signif¬ 
icant  refinements  of  the  basic  techniques  presented  here.  In  addition,  many  new 
techniques  have  been  developed  for  proving  cryptosystems  to  be  secure.  For  ex¬ 
ample,  Goldwasser  and  Micali  [142]  show  that  randomization  can  be  an  effective 
tool  in  the  design  of  secure  public -key  encryption  schemes.  For  signature  schemes, 
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Goldwasser,  Micali,  and  Rivest  [143]  present  a  digital-signature  scheme  for  which 
eve  17  conceivable  type  of  forgery  is  provably  as  difficult  as  factoring.  Menezes, 
van  Oorschot,  and  Vanstone  [254]  provide  an  overview  of  applied  cryptography. 

The  rho  heuristic  for  integer  factorization  was  invented  by  Pollard  [277].  The 
version  presented  here  is  a  variant  proposed  by  Brent  [56]. 

The  best  algorithms  for  factoring  large  numbers  have  a  running  time  that  grows 
roughly  exponentially  with  the  cube  root  of  the  length  of  the  number  n  to  be  fac¬ 
tored.  The  general  number-field  sieve  factoring  algorithm  (as  developed  by  Buh- 
ler,  Lenstra,  and  Pomerance  [57]  as  an  extension  of  the  ideas  in  the  number-field 
sieve  factoring  algorithm  by  Pollard  [278]  and  Lenstra  et  al.  [232]  and  refined  by 
Coppersmith  [77]  and  others)  is  perhaps  the  most  efficient  such  algorithm  in  gen¬ 
eral  for  large  inputs.  Although  it  is  difficult  to  give  a  rigorous  analysis  of  this 
algorithm,  under  reasonable  assumptions  we  can  derive  a  running-time  estimate  of 
L(1/3,77)1'902+o(1),  where  L(a,n)  =  e(1  n")"(lnln”)1 

The  elliptic-curve  method  due  to  Lenstra  [233]  may  be  more  effective  for  some 
inputs  than  the  number-field  sieve  method,  since,  like  Pollard’s  rho  method,  it  can 
find  a  small  prime  factor  p  quite  quickly.  With  this  method,  the  time  to  find  p  is 
estimated  to  be  L(  1/2,  p)^2+o(1). 


32 


String  Matching 


Text-editing  programs  frequently  need  to  find  all  occurrences  of  a  pattern  in  the 
text.  Typically,  the  text  is  a  document  being  edited,  and  the  pattern  searched  for  is  a 
particular  word  supplied  by  the  user.  Efficient  algorithms  for  this  problem— called 
“string  matching”— can  greatly  aid  the  responsiveness  of  the  text-editing  program. 
Among  their  many  other  applications,  string-matching  algorithms  search  for  par¬ 
ticular-  patterns  in  DNA  sequences.  Internet  search  engines  also  use  them  to  find 
Web  pages  relevant  to  queries. 

We  formalize  the  string-matching  problem  as  follows.  We  assume  that  the 
text  is  an  array  T[  1 . .  //]  of  length  n  and  that  the  pattern  is  an  array  P  [  1  .  ,m\ 
of  length  m  <  n.  We  further  assume  that  the  elements  of  P  and  T  are  char¬ 
acters  drawn  from  a  finite  alphabet  E.  For  example,  we  may  have  E  =  {0,1} 
or  E  =  {a,  b, . . . ,  z}.  The  character  arrays  P  and  T  are  often  called  strings  of 
characters. 

Referring  to  Figure  32.1,  we  say  that  pattern  P  occurs  with  shift  s  in  text  T 
(or,  equivalently,  that  pattern  P  occurs  beginning  at  position  s  +  1  in  text  T)  if 
0  <  s  <  n —  m  and  T[5  +  1 . .  s  +  m\  =  P [1 . .  m]  (that  is,  if  +  j ]  =  P [j ],  for 
1  <  j  <  m).  If  P  occurs  with  shift  s  in  T ,  then  we  call  s  a  valid  shift ;  otherwise, 
we  call  s  an  invalid  shift.  The  string-matching  problem  is  the  problem  of  finding 
all  valid  shifts  with  which  a  given  pattern  P  occurs  in  a  given  text  T . 


Figure  32.1  An  example  of  the  string  matching  problem,  where  we  want  to  find  all  occurrences  of 
the  pattern  P  =  abaa  in  the  text  T  =  abcabaabcabac.  The  pattern  occurs  only  once  in  the  text, 
at  shift  s  =  3,  which  we  call  a  valid  shift.  A  vertical  line  connects  each  character  of  the  pattern  to  its 
matching  character  in  the  text,  and  all  matched  characters  are  shaded. 
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Algorithm  Preprocessing  time  Matching  time 

Naive  0  0((n  —m  +  l)/w) 

Rabin  Karp  @(m)  0((n  —m  +  I )m) 

Finite  automaton  0(m  |£|)  &(n) 

Knuth  Morris  Pratt  ®(m)  ®(n) 


Figure  32.2  The  string  matching  algorithms  in  this  chapter  and  their  preprocessing  and  matching 
times. 

Except  for  the  naive  brute-force  algorithm,  which  we  review  in  Section  32.1, 
each  string-matching  algorithm  in  this  chapter  performs  some  preprocessing  based 
on  the  pattern  and  then  finds  all  valid  shifts;  we  call  this  latter  phase  “matching.” 
Figure  32.2  shows  the  preprocessing  and  matching  times  for  each  of  the  algorithms 
in  this  chapter.  The  total  running  time  of  each  algorithm  is  the  sum  of  the  prepro¬ 
cessing  and  matching  times.  Section  32.2  presents  an  interesting  string-matching 
algorithm,  due  to  Rabin  and  Karp.  Although  the  0((«  —  m  +  I  )m)  worst-case 
running  time  of  this  algorithm  is  no  better  than  that  of  the  naive  method,  it  works 
much  better  on  average  and  in  practice.  It  also  generalizes  nicely  to  other  pattern¬ 
matching  problems.  Section  32.3  then  describes  a  string-matching  algorithm  that 
begins  by  constructing  a  finite  automaton  specifically  designed  to  search  for  occur¬ 
rences  of  the  given  pattern  P  in  a  text.  This  algorithm  takes  0(m  |S|)  preprocess¬ 
ing  time,  but  only  0(»)  matching  time.  Section  32.4  presents  the  similar,  but  much 
cleverer,  Knuth-Morris-Pratt  (or  KMP)  algorithm;  it  has  the  same  0(«)  matching 
time,  and  it  reduces  the  preprocessing  time  to  only  0(m). 

Notation  and  terminology 

We  denote  by  E*  (read  “sigma-star”)  the  set  of  all  finite-length  strings  formed 
using  characters  from  the  alphabet  E.  In  this  chapter,  we  consider  only  finite- 
length  strings.  The  zero-length  empty  string,  denoted  e,  also  belongs  to  E*.  The 
length  of  a  string  x  is  denoted  |x|.  The  concatenation  of  two  strings  x  and  y, 
denoted  xy,  has  length  |x|  +  |  y  |  and  consists  of  the  characters  from  x  followed  by 
the  characters  from  y. 

We  say  that  a  string  in  is  a  prefix  of  a  string  x,  denoted  in  C  x,  if  x  =  ivy  for 
some  string  y  e  E*.  Note  that  if  in  IZ  x,  then  |in|  <  |x|.  Similarly,  we  say  that  a 
string  in  is  a  suffix  of  a  string  x,  denoted  in  □  x,  if  x  =  yw  for  some  y  e  E*.  As 
with  a  prefix,  in  □  x  implies  |in|  <  \x\.  For  example,  we  have  ab  IZ  abcca  and 
cca  □  abcca.  The  empty  string  e  is  both  a  suffix  and  a  prefix  of  every  string.  For 
any  strings  x  and  y  and  any  character  a,  we  have  x  □  y  if  and  only  if  xa  □  ya. 
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(b) 


(c) 


Figure  32.3  A  graphical  proof  of  Lemma  32. 1.  We  suppose  that  x  □  z  and  y  □  z.  The  three  parts 
of  the  figure  illustrate  the  three  cases  of  the  lemma.  Vertical  lines  connect  matching  regions  (shown 
shaded)  of  the  strings,  (a)  If  |x|  <  |y|,  then  x  □  y.  (b)  If  |x|  >  |y|,  then  y  □  x.  (c)  If  |x|  =  |y|, 
then  x  =  y. 

Also  note  that  C  and  □  are  transitive  relations.  The  following  lemma  will  be  useful 
later. 

Lemma  32.1  ( Overlapping-suffix  lemma ) 

Suppose  that  x,  y,  and  z  are  strings  such  that  x  □  z  and  y  □  z.  If  |x|  <  |y|, 
then  x  □  y.  If  |x|  >  |y|,  then  y  □  x.  If  |x|  =  |y|,  then  x  =  y. 

Proof  See  Figure  32.3  for  a  graphical  proof.  ■ 

For  brevity  of  notation,  we  denote  the  k -character  prefix  P  [1  . .  k]  of  the  pattern 
P[1  .  .in]  by  f\.  Thus,  P0  =  e  and  Pm  =  P  =  P[  1 . . m].  Similarly,  we  denote 
the  /c -character  prefix  of  the  text  T  by  7/c.  Using  this  notation,  we  can  state  the 
string-matching  problem  as  that  of  finding  all  shifts  s  in  the  range  0  <  s  <  n  —  m 
such  that  P  □  Ts+m. 

In  our  pseudocode,  we  allow  two  equal-length  strings  to  be  compared  for  equal¬ 
ity  as  a  primitive  operation.  If  the  strings  are  compared  from  left  to  right  and  the 
comparison  stops  when  a  mismatch  is  discovered,  we  assume  that  the  time  taken 
by  such  a  test  is  a  linear  function  of  the  number  of  matching  characters  discovered. 
To  be  precise,  the  test  “x  ==  y”  is  assumed  to  take  time  0(f  +  1),  where  t  is  the 
length  of  the  longest  string  z  such  that  z  IZ  x  and  z  C  y.  (We  write  @(t  +  1) 
rather  than  @(t)  to  handle  the  case  in  which  t  =  0;  the  first  characters  compared 
do  not  match,  but  it  takes  a  positive  amount  of  time  to  perform  this  comparison.) 
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32.1  The  naive  string-matching  algorithm 

The  naive  algorithm  finds  all  valid  shifts  using  a  loop  that  checks  the  condition 
P  [1 . .  m]  =  +  1 . .  s  +  in]  for  each  of  the  n  —  m  +  1  possible  values  of  s. 

Naive-String-Matcher  (T,  P ) 

1  n  =  T.  length 

2  m  =  P.  length 

3  for  s  =  0  to  n  —  m 

4  if  P[1 .  ,m\  ==  7"[5  +  1 .  .s  +  m] 

5  print  “Pattern  occurs  with  shift”  s 

Figure  32.4  portrays  the  naive  string-matching  procedure  as  sliding  a  “template” 
containing  the  pattern  over  the  text,  noting  for  which  shifts  all  of  the  characters 
on  the  template  equal  the  corresponding  characters  in  the  text.  The  for  loop  of 
lines  3-5  considers  each  possible  shift  explicitly.  The  test  in  line  4  determines 
whether  the  current  shift  is  valid;  this  test  implicitly  loops  to  check  corresponding 
character  positions  until  all  positions  match  successfully  or  a  mismatch  is  found. 
Line  5  prints  out  each  valid  shift  s. 

Procedure  Naive-String-Matcher  takes  time  0((n  —  m  +  1  )tn),  and  this 
bound  is  tight  in  the  worst  case.  For  example,  consider  the  text  string  a"  (a  string 
of  n  a’s)  and  the  pattern  am.  For  each  of  the  n—m  +  1  possible  values  of  the  shift  s, 
the  implicit  loop  on  line  4  to  compare  corresponding  characters  must  execute  m 
times  to  validate  the  shift.  The  worst-case  running  time  is  thus  ©((«  —  m  +  1  )m), 
which  is  0(n2)  if  m  =  \n /2J .  Because  it  requires  no  preprocessing,  Naive- 
String-Matcher’s  running  time  equals  its  matching  time. 


(a)  (b)  (c)  (d) 


Figure  32.4  The  operation  of  the  naive  string  matcher  for  the  pattern  P  =  aab  and  the  text 
T  =  acaabc.  We  can  imagine  the  pattern  P  as  a  template  that  we  slide  next  to  the  text,  (a)  (d)  The 
four  successive  alignments  tried  by  the  naive  string  matcher.  In  each  part,  vertical  lines  connect  cor 
responding  regions  found  to  match  (shown  shaded),  and  a  jagged  line  connects  the  first  mismatched 
character  found,  if  any.  The  algorithm  finds  one  occurrence  of  the  pattern,  at  shift  s  =  2,  shown  in 
part  (c). 
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As  we  shall  see,  Naive-String-Matcher  is  not  an  optimal  procedure  for  this 
problem.  Indeed,  in  this  chapter  we  shall  see  that  the  Knuth-Morris-Pratt  algorithm 
is  much  better  in  the  worst  case.  The  naive  string-matcher  is  inefficient  because 
it  entirely  ignores  information  gained  about  the  text  for  one  value  of  s  when  it 
considers  other  values  of  s.  Such  information  can  be  quite  valuable,  however.  For 
example,  if  P  =  aaab  and  we  find  that  s  =  0  is  valid,  then  none  of  the  shifts  1,  2, 
or  3  are  valid,  since  T[ 4]  =  b.  In  the  following  sections,  we  examine  several  ways 
to  make  effective  use  of  this  sort  of  information. 

Exercises 


32.1-1 

Show  the  comparisons  the  naive  string  matcher  makes  for  the  pattern  P  =  0001 
in  the  text  T  =  000010001010001. 


32.1-2 

Suppose  that  all  characters  in  the  pattern  P  are  different.  Show  how  to  accelerate 
Naive-String-Matcher  to  run  in  time  O(n)  on  an  n-character  text  T. 


32.1-3 

Suppose  that  pattern  P  and  text  T  are  randomly  chosen  strings  of  length  m  and  n, 
respectively,  from  the  d-ary  alphabet  Sj  =  {0,  1, . . . ,  d  —  1},  where  d  >  2.  Show 
that  the  expected  number  of  character-to-character  comparisons  made  by  the  im¬ 
plicit  loop  in  line  4  of  the  naive  algorithm  is 

1  -  d~m 

(. n  —  m  +  1) - -  <  2(n  —  m  +  1) 

1  -  d~l 

over  all  executions  of  this  loop.  (Assume  that  the  naive  algorithm  stops  comparing 
characters  for  a  given  shift  once  it  finds  a  mismatch  or  matches  the  entire  pattern.) 
Thus,  for  randomly  chosen  strings,  the  naive  algorithm  is  quite  efficient. 


32.1-4 

Suppose  we  allow  the  pattern  P  to  contain  occurrences  of  a  gap  character  O  that 
can  match  an  arbitrary  string  of  characters  (even  one  of  zero  length).  For  example, 
the  pattern  abObaOc  occurs  in  the  text  cabccbacbacab  as 

c  ab  cc  ba  cba  c  ab 

ab  <>  ba  o  c 

and  as 

c  ab  ccbac  ba  c  ab . 

ab  o  ba  <>  c 
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Note  that  the  gap  character  may  occur  an  arbitrary  number  of  times  in  the  pattern 
but  not  at  all  in  the  text.  Give  a  polynomial-time  algorithm  to  determine  whether 
such  a  pattern  P  occurs  in  a  given  text  T,  and  analyze  the  running  time  of  your 
algorithm. 


32.2  The  Rabin-Karp  algorithm 

Rabin  and  Karp  proposed  a  string-matching  algorithm  that  performs  well  in  prac¬ 
tice  and  that  also  generalizes  to  other  algorithms  for  related  problems,  such  as 
two-dimensional  pattern  matching.  The  Rabin-Karp  algorithm  uses  ®(m)  prepro¬ 
cessing  time,  and  its  worst-case  running  time  is  @((n—m  +  1  )>n).  Based  on  certain 
assumptions,  however,  its  average-case  running  time  is  better. 

This  algorithm  makes  use  of  elementary  number-theoretic  notions  such  as  the 
equivalence  of  two  numbers  modulo  a  third  number.  You  might  want  to  refer  to 
Section  31.1  for  the  relevant  definitions. 

For  expository  puiposes,  let  us  assume  that  X  =  {0,1,2,..., 9},  so  that  each 
character  is  a  decimal  digit.  (In  the  general  case,  we  can  assume  that  each  charac¬ 
ter  is  a  digit  in  radix- d?  notation,  where  d  —  |X|.)  We  can  then  view  a  string  of  k 
consecutive  characters  as  representing  a  length-/:  decimal  number.  The  character 
string  31415  thus  corresponds  to  the  decimal  number  31,415.  Because  we  inter¬ 
pret  the  input  characters  as  both  graphical  symbols  and  digits,  we  find  it  convenient 
in  this  section  to  denote  them  as  we  would  digits,  in  our  standard  text  font. 

Given  a  pattern  P  [1  . .  m],  let  p  denote  its  corresponding  decimal  value.  In  a  sim¬ 
ilar  manner,  given  a  text  T[\  . .  n],  let  ts  denote  the  decimal  value  of  the  length-m 
substring  T[s  +  1  . .  s  +  m\,  for  s  =  0, 1, . . . ,  n  —  m.  Certainly,  ts  =  p  if  and  only 
if  T[s  +  1 . .  s  +  m]  =  P  [1  . .  m];  thus,  s  is  a  valid  shift  if  and  only  if  ts  =  p.  If  we 
could  compute  p  in  time  @(m)  and  all  the  ts  values  in  a  total  of  0(«  —  m  +  1)  time,1 
then  we  could  determine  all  valid  shifts  s  in  time  0(m)  +  ©(«  —  m  +  1)  =  0(n) 
by  comparing  p  with  each  of  the  ts  values.  (For  the  moment,  let’s  not  worry  about 
the  possibility  that  p  and  the  ts  values  might  be  very  large  numbers.) 

We  can  compute  p  in  time  @(m)  using  Horner’s  rule  (see  Section  30.1): 

p  =  P[m\  +  10  (P[m  -  1]  +  10(P[m  —  2]  +  •••  +  10(P[2]  +  10P[1])  ■  ■  ■))  . 
Similarly,  we  can  compute  t0  from  T[\ . .  tn\  in  time  0(m). 


1We  write  @{n  —  m  +  1)  instead  of  @(n  —  m )  because  s  takes  on  n  —  m  +  1  different  values.  The 
“+1”  is  significant  in  an  asymptotic  sense  because  when  m  =  n ,  computing  the  lone  ts  value  takes 
0(1)  time,  not  0(0)  time. 
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To  compute  the  remaining  values  4,  t2, . . . ,  ln-m  in  time  &(n  —  m),  we  observe 
that  we  can  compute  ts+ 1  from  ts  in  constant  time,  since 

ts+1  =  10(4  -  10 m~lT[s  +  1])  +  T[s  +  m  +  1]  .  (32.1) 

Subtracting  10m_1  T[s  +  1]  removes  the  high-order  digit  from  ts,  multiplying  the 
result  by  10  shifts  the  number  left  by  one  digit  position,  and  adding  T[s  +  m  +  1] 
brings  in  the  appropriate  low-order  digit.  For  example,  if  m  =  5  and  ts  =  31415, 
then  we  wish  to  remove  the  high-order  digit  r[j  +  1]  =  3  and  bring  in  the  new 
low-order  digit  (suppose  it  is  r[s  +  5  +  1]  =  2)  to  obtain 

4+i  =  10(31415-  10000-3) +  2 

=  14152. 

If  we  precompute  the  constant  10"'_1  (which  we  can  do  in  time  0(lg  m)  using  the 
techniques  of  Section  31.6,  although  for  this  application  a  straightforward  O(m)- 
time  method  suffices),  then  each  execution  of  equation  (32.1)  takes  a  constant  num¬ 
ber  of  arithmetic  operations.  Thus,  we  can  compute  p  in  time  0(/n),  and  we  can 
compute  all  of  t0,  t\, . . . ,  /„_m  in  time  0(/j  —  m  +  1).  Therefore,  we  can  find  all 
occurrences  of  the  pattern  P[1 . .  m\  in  the  text  T[1  . .  n\  with  0(m)  preprocessing 
time  and  0(n  —  m  +  1)  matching  time. 

Until  now,  we  have  intentionally  overlooked  one  problem:  p  and  4  may  be 
too  large  to  work  with  conveniently.  If  P  contains  m  characters,  then  we  cannot 
reasonably  assume  that  each  arithmetic  operation  on  p  (which  is  m  digits  long) 
takes  “constant  time.”  Fortunately,  we  can  solve  this  problem  easily,  as  Figure  32.5 
shows:  compute  p  and  the  4  values  modulo  a  suitable  modulus  q.  We  can  compute 
p  modulo  q  in  0(m)  time  and  all  the  4  values  modulo  q  in  0(«  —  m  +  1)  time. 
If  we  choose  the  modulus  q  as  a  prime  such  that  \0q  just  fits  within  one  computer 
word,  then  we  can  perform  all  the  necessary  computations  with  single-precision 
arithmetic.  In  general,  with  a  d- ary  alphabet  {0, 1, . . . ,  d  —  1},  we  choose  q  so 
that  dq  fits  within  a  computer  word  and  adjust  the  recurrence  equation  (32.1)  to 
work  modulo  q,  so  that  it  becomes 

4+i  =  (d (4  —  r[s  +  1  ]/z)  +  T[5  +  m  +  1])  mod  q  ,  (32.2) 

where  h  =  dm~l  (mod  q)  is  the  value  of  the  digit  “1”  in  the  high-order  position 
of  an  m -digit  text  window. 

The  solution  of  working  modulo  q  is  not  perfect,  however:  ts  =  p  (mod  q) 
does  not  imply  that  4  =  p.  On  the  other  hand,  if  4  ^  p  (mod  q),  then  we 
definitely  have  that  4  ^  p,  so  that  shift  s  is  invalid.  We  can  thus  use  the  test 
4  =  p  (mod  q)  as  a  fast  heuristic  test  to  rule  out  invalid  shifts  s.  Any  shift  s  for 
which  4  =  p  (mod  q)  must  be  tested  further  to  see  whether  s  is  really  valid  or 
we  just  have  a  spurious  hit.  This  additional  test  explicitly  checks  the  condition 


992 


Chapter  32  String  Matching 


2 

3 

5 

9 

0 

2 

3 

1 

4 

1 

5 

2 

6 

7 

3 

9 

9 

2 

1 

mod  13 


7 


(a) 


1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19 


(b) 


old 

high-order 

digit 

\ 


new 

low -order 
digit 

/ 


old  new 

high-order  low-order 

digit  shift  digit 

\  \  / 

14152  =  (31415  3  10000)  10 +2  (mod  13) 
s  (7  3-3)10+2  (mod  13) 

-  8  (mod  13) 
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Figure  32.5  The  Rabin  Karp  algorithm.  Each  character  is  a  decimal  digit,  and  we  compute  values 
modulo  13.  (a)  A  text  string.  A  window  of  length  5  is  shown  shaded.  The  numerical  value  of  the 
shaded  number,  computed  modulo  13,  yields  the  value  7.  (b)  The  same  text  string  with  values  com 
puted  modulo  13  for  each  possible  position  of  a  length  5  window.  Assuming  the  pattern  P  =  31415, 
we  look  for  windows  whose  value  modulo  13  is  7,  since  31415  =  7  (mod  13).  The  algorithm  finds 
two  such  windows,  shown  shaded  in  the  figure.  The  first,  beginning  at  text  position  7,  is  indeed  an 
occurrence  of  the  pattern,  while  the  second,  beginning  at  text  position  13,  is  a  spurious  hit.  (c)  How 
to  compute  the  value  for  a  window  in  constant  time,  given  the  value  for  the  previous  window.  The 
first  window  has  value  31415.  Dropping  the  high  order  digit  3,  shifting  left  (multiplying  by  10),  and 
then  adding  in  the  low  order  digit  2  gives  us  the  new  value  14152.  Because  all  computations  are 
performed  modulo  13,  the  value  for  the  first  window  is  7,  and  the  value  for  the  new  window  is  8. 
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P[\ .  .m]  =  T[s  +  1  . .  s  +  m\.  If  q  is  large  enough,  then  we  hope  that  spurious 
hits  occur  infrequently  enough  that  the  cost  of  the  extra  checking  is  low. 

The  following  procedure  makes  these  ideas  precise.  The  inputs  to  the  procedure 
are  the  text  T,  the  pattern  P ,  the  radix  d  to  use  (which  is  typically  taken  to  be  [  S  |), 
and  the  prime  q  to  use. 

Rabin-Karp-Matcher  (7)  P.d.q) 

1  n  =  T.  length 

2  m  =  P.  length 

3  h  =  dm~l  mod  q 

4  p  =  0 

5  t0  =  0 

6  for  i  =  1  to  m  II  preprocessing 

7  p  =  {dp  +  P  [/])  mod  q 

8  t0  =  ( dt0  +  T[/])  mod  q 

9  for  s  =  0  to  n  —  m  II  matching 

10  if  p  --  ts 

11  if  P[1 . .  m\  ==  7"[5  +  1 . .  s  +  m] 

1 2  print  “Pattern  occurs  with  shift”  s 

13  if  s  <  n  —  m 

14  ts+i  =  (d{ts  —  T[s  +  1]/;)  +  T[s  +  m  +  1])  mod  q 

The  procedure  Rabin-Karp-Matcher  works  as  follows.  All  characters  are 
interpreted  as  radix-c/  digits.  The  subscripts  on  t  are  provided  only  for  clarity;  the 
program  works  correctly  if  all  the  subscripts  are  dropped.  Line  3  initializes  h  to  the 
value  of  the  high-order  digit  position  of  an  m -digit  window.  Lines  4-8  compute  p 
as  the  value  of  P[\  . .  m\  mod  q  and  /„  as  the  value  of  T[  1  . .  m\  mod  q.  The  for 
loop  of  lines  9-14  iterates  through  all  possible  shifts  s,  maintaining  the  following 
invariant: 

Whenever  line  10  is  executed,  ts  =  T[s  +  1  ■ .  s  +  m]  mod  q. 

If  p  =  ts  in  line  10  (a  “hit”),  then  line  11  checks  to  see  whether  P  [  1  . .  m]  = 
T[5  +  1 . .  s  +  m]  in  order  to  rule  out  the  possibility  of  a  spurious  hit.  Line  12  prints 
out  any  valid  shifts  that  are  found.  If  s  <  n  —  m  (checked  in  line  13),  then  the  for 
loop  will  execute  at  least  one  more  time,  and  so  line  14  first  executes  to  ensure  that 
the  loop  invariant  holds  when  we  get  back  to  line  10.  Line  14  computes  the  value 
of  L+  i  mod  q  from  the  value  of  ts  mod  q  in  constant  time  using  equation  (32.2) 
directly. 

Rabin-Karp-Matcher  takes  &{m )  preprocessing  time,  and  its  matching  time 
is  0((/7  —  m  +  1  )m)  in  the  worst  case,  since  (like  the  naive  string-matching  algo¬ 
rithm)  the  Rabin-Karp  algorithm  explicitly  verifies  every  valid  shift.  If  P  =  am 


994 


Chapter  32  String  Matching 


and  T  —  a",  then  verifying  takes  time  ©((«  —in  +  l)m),  since  each  of  the  n  —m  +  1 
possible  shifts  is  valid. 

In  many  applications,  we  expect  few  valid  shifts— perhaps  some  constant  c  of 
them.  In  such  applications,  the  expected  matching  time  of  the  algorithm  is  only 
0((n  —  m  +  1)  +  cm)  =  0(n  +  m),  plus  the  time  required  to  process  spurious 
hits.  We  can  base  a  heuristic  analysis  on  the  assumption  that  reducing  values  mod¬ 
ulo  q  acts  like  a  random  mapping  from  E*  to  7Lq.  (See  the  discussion  on  the  use  of 
division  for  hashing  in  Section  11.3.1.  It  is  difficult  to  formalize  and  prove  such  an 
assumption,  although  one  viable  approach  is  to  assume  that  q  is  chosen  randomly 
from  integers  of  the  appropriate  size.  We  shall  not  pursue  this  formalization  here.) 
We  can  then  expect  that  the  number  of  spurious  hits  is  0(n/q),  since  we  can  es¬ 
timate  the  chance  that  an  arbitrary  ts  will  be  equivalent  to  p,  modulo  q,  as  1/q. 
Since  there  are  0(n)  positions  at  which  the  test  of  line  10  fails  and  we  spend  O(m) 
time  for  each  hit,  the  expected  matching  time  taken  by  the  Rabin-Karp  algorithm 
is 

O(n)  +  0(m(v  +  n/q))  , 

where  v  is  the  number  of  valid  shifts.  This  running  time  is  O(n)  if  v  =  0(1)  and 
we  choose  q  >  in.  That  is,  if  the  expected  number  of  valid  shifts  is  small  (0(1)) 
and  we  choose  the  prime  q  to  be  larger  than  the  length  of  the  pattern,  then  we 
can  expect  the  Rabin-Karp  procedure  to  use  only  0(n  +  in )  matching  time.  Since 
m  </?,  this  expected  matching  time  is  0{n). 

Exercises 


32.2-1 

Working  modulo  q  —  11,  how  many  spurious  hits  does  the  Rabin-Karp  matcher  en¬ 
counter  in  the  text  T  =  3141592653589793  when  looking  for  the  pattern  P  =  26? 


32.2-2 

How  would  you  extend  the  Rabin-Karp  method  to  the  problem  of  searching  a  text 
string  for  an  occurrence  of  any  one  of  a  given  set  of  k  patterns?  Start  by  assuming 
that  all  k  patterns  have  the  same  length.  Then  generalize  your  solution  to  allow  the 
patterns  to  have  different  lengths. 


32.2-3 

Show  how  to  extend  the  Rabin-Karp  method  to  handle  the  problem  of  looking  for 
a  given  m  x  m  pattern  in  an  n  x  n  array  of  characters.  (The  pattern  may  be  shifted 
vertically  and  horizontally,  but  it  may  not  be  rotated.) 
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32.2-4 


Alice  has  a  copy  of  a  long  /;-bit  file  A  =  a„_ 2,  ■  ■  • ,  tto),  and  Bob  similarly 

has  an  //-bit  file  B  =  {bn_ x,  bn_ 2,  . . . ,  b0).  Alice  and  Bob  wish  to  know  if  their 
files  are  identical.  To  avoid  transmitting  all  of  A  or  B ,  they  use  the  following  fast 
probabilistic  check.  Together,  they  select  a  prime  q  >  1000/;  and  randomly  select 
an  integer  x  from  {0, 1, . . . ,  q  —  1}.  Then,  Alice  evaluates 


and  Bob  similarly  evaluates  B(x).  Prove  that  if  A  B,  there  is  at  most  one 
chance  in  1000  that  A(x)  =  B{x),  whereas  if  the  two  files  are  the  same,  A  (x)  is 
necessarily  the  same  as  B(x).  {Hint:  See  Exercise  31.4-4.) 


32.3  String  matching  with  finite  automata 


Many  string-matching  algorithms  build  a  finite  automaton— a  simple  machine  for 
processing  information— that  scans  the  text  string  T  for  all  occurrences  of  the  pat¬ 
tern  P.  This  section  presents  a  method  for  building  such  an  automaton.  These 
string-matching  automata  are  very  efficient:  they  examine  each  text  character  ex¬ 
actly  once,  taking  constant  time  per  text  character.  The  matching  time  used— after 
preprocessing  the  pattern  to  build  the  automaton— is  therefore  Q(/?).  The  time  to 
build  the  automaton,  however,  can  be  large  if  E  is  large.  Section  32.4  describes  a 
clever  way  around  this  problem. 

We  begin  this  section  with  the  definition  of  a  finite  automaton.  We  then  examine 
a  special  string-matching  automaton  and  show  how  to  use  it  to  find  occurrences 
of  a  pattern  in  a  text.  Finally,  we  shall  show  how  to  construct  the  string-matching 
automaton  for  a  given  input  pattern. 

Finite  automata 

A  finite  automaton  M,  illustrated  in  Figure  32.6,  is  a  5-tuple  (Q,  qo,A,  E,  S), 
where 

•  Q  is  a  finite  set  of  states, 

•  q0  e  Q  is  the  start  state, 

•  A  Q  Q  is  a  distinguished  set  of  accepting  states, 

•  E  is  a  finite  input  alphabet, 

•  S  is  a  function  from  Q  x  E  into  Q ,  called  the  transition  function  of  M . 
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Figure  32.6  A  simple  two  state  finite  automaton  with  state  set  Q  =  {0. 1},  start  state  qo  =  0, 
and  input  alphabet  E  =  {a,b}.  (a)  A  tabular  representation  of  the  transition  function  S.  (b)  An 
equivalent  state  transition  diagram.  State  1,  shown  blackend,  is  the  only  accepting  state.  Directed 
edges  represent  transitions.  For  example,  the  edge  from  state  1  to  state  0  labeled  b  indicates  that 
5(1,  b)  =  0.  This  automaton  accepts  those  strings  that  end  in  an  odd  number  of  a’s.  More  precisely, 
it  accepts  a  string  x  if  and  only  if  x  =  yz,  where  y  =  e  or  _y  ends  with  a  b,  and  z  =  a*,  where  k  is 
odd.  For  example,  on  input  abaaa,  including  the  start  state,  this  automaton  enters  the  sequence  of 
states  (0,  1,0, 1,0, 1),  and  so  it  accepts  this  input.  For  input  abbaa,  it  enters  the  sequence  of  states 
{0, 1 , 0, 0, 1 , 0) ,  and  so  it  rejects  this  input. 

The  finite  automaton  begins  in  state  qo  and  reads  the  characters  of  its  input  string 
one  at  a  time.  If  the  automaton  is  in  state  q  and  reads  input  character  a,  it  moves 
(“makes  a  transition”)  from  state  q  to  state  8(q,a).  Whenever  its  current  state  q  is 
a  member  of  A,  the  machine  M  has  accepted  the  string  read  so  far.  An  input  that 
is  not  accepted  is  rejected. 

A  finite  automaton  M  induces  a  function  </>,  called  the  final-state  function , 
from  £*  to  Q  such  that  tp(w)  is  the  state  M  ends  up  in  after  scanning  the  string  w. 
Thus,  M  accepts  a  string  w  if  and  only  if  (p(w)  €  A.  We  define  the  function  <fi 
recursively,  using  the  transition  function: 

<P(e)  =  qo  , 

(p(wa)  =  8((p(w),a)  for  w  e  £*,a  6  £  . 

String-matching  automata 

For  a  given  pattern  P ,  we  construct  a  string-matching  automaton  in  a  preprocess¬ 
ing  step  before  using  it  to  search  the  text  string.  Figure  32.7  illustrates  how  we 
construct  the  automaton  for  the  pattern  P  =  ababaca.  From  now  on,  we  shall 
assume  that  P  is  a  given  fixed  pattern  string;  for  brevity,  we  shall  not  indicate  the 
dependence  upon  P  in  our  notation. 

In  order  to  specify  the  string-matching  automaton  corresponding  to  a  given  pat¬ 
tern  .P[l . .  m\,  we  first  define  an  auxiliary  function  a,  called  the  suffix  function 
corresponding  to  P.  The  function  a  maps  £*  to  {0, 1, . . . ,  m)  such  that  a(x)  is  the 
length  of  the  longest  prefix  of  P  that  is  also  a  suffix  of  x: 

ct(x)  =  max  {k  :  Pg  □  x}  . 


(32.3) 
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Figure  32.7  (a)  A  state  transition  diagram  for  the  string  matching  automaton  that  accepts  all 

strings  ending  in  the  string  ababaca.  State  0  is  the  start  state,  and  state  7  (shown  blackened)  is 
the  only  accepting  state.  A  directed  edge  from  state  /  to  state  j  labeled  a  represents  8(i,a)  =  j .  The 
right  going  edges  forming  the  “spine”  of  the  automaton,  shown  heavy  in  the  figure,  correspond  to 
successful  matches  between  pattern  and  input  characters.  The  left  going  edges  correspond  to  failing 
matches.  Some  edges  corresponding  to  failing  matches  are  omitted;  by  convention,  if  a  state  i  has 
no  outgoing  edge  labeled  a  for  some  a  e  E,  then  8(i,a)  =  0.  (b)  The  corresponding  transition 
function  8,  and  the  pattern  string  P  =  ababaca.  The  entries  corresponding  to  successful  matches 
between  pattern  and  input  characters  are  shown  shaded,  (c)  The  operation  of  the  automaton  on  the 
text  T  =  abababacaba.  Under  each  text  character  7'[i]  appears  the  state  0(7/)  that  the  automa 
ton  is  in  after  processing  the  prefix  7/.  The  automaton  finds  one  occurrence  of  the  pattern,  ending  in 
position  9. 

The  suffix  function  o  is  well  defined  since  the  empty  string  P0  =  £  is  a  suf¬ 
fix  of  every  string.  As  examples,  for  the  pattern  P  =  ab,  we  have  ct(s)  =  0, 
a(ccaca)  =  1,  and  cr(ccab)  =  2.  For  a  pattern  P  of  length  m,  we  have 
o(x)  =  m  if  and  only  if  P  □  x.  From  the  definition  of  the  suffix  function, 
x  □  y  implies  a(x)  <  a(y). 

We  define  the  string- matching  automaton  that  corresponds  to  a  given  pattern 
P[1 .  .m\  as  follows: 
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•  The  state  set  Q  is  {0,  1, ... ,  m}.  The  stall  state  q0  is  state  0,  and  state  m  is  the 
only  accepting  state. 

•  The  transition  function  8  is  defined  by  the  following  equation,  for  any  state  q 
and  character  a : 

8(q,a)  =  a(Pqa)  .  (32.4) 

We  define  8(q,a)  =  a(Pqa)  because  we  want  to  keep  track  of  the  longest  pre¬ 
fix  of  the  pattern  P  that  has  matched  the  text  string  T  so  far.  We  consider  the 
most  recently  read  characters  of  T.  In  order  for  a  substring  of  T —let’s  say  the 
substring  ending  at  T[i]— to  match  some  prefix  Pj  of  P,  this  prefix  Pj  must  be  a 
suffix  of  Tj.  Suppose  that  q  =  0 ( 7)  ) ,  so  that  after  reading  7),  the  automaton  is  in 
state  q.  We  design  the  transition  function  8  so  that  this  state  number,  q,  tells  us  the 
length  of  the  longest  prefix  of  P  that  matches  a  suffix  of  T, .  That  is,  in  state  q, 
Pq  □  Tj  and  q  =  er(7}).  (Whenever  q  =  m,  all  m  characters  of  P  match  a  suffix 
of  Tj,  and  so  we  have  found  a  match.)  Thus,  since  0(7})  and  a  (Tj)  both  equal  q, 
we  shall  see  (in  Theorem  32.4,  below)  that  the  automaton  maintains  the  following 
invariant: 

0(7})  =  a  (Tj)  .  (32.5) 

If  the  automaton  is  in  state  q  and  reads  the  next  character  T[i  +  1]  =  a,  then  we 
want  the  transition  to  lead  to  the  state  corresponding  to  the  longest  prefix  of  P  that 
is  a  suffix  of  7} a ,  and  that  state  is  n(T, a).  Because  Pq  is  the  longest  prefix  of  P 
that  is  a  suffix  of  7} ,  the  longest  prefix  of  P  that  is  a  suffix  of  T,a  is  not  only  a(Tjd), 
but  also  a(Pqa).  (Lemma  32.3,  on  page  1000,  proves  that  ct(7}o)  =  a(Pqa).) 
Thus,  when  the  automaton  is  in  state  q,  we  want  the  transition  function  on  charac¬ 
ter  a  to  take  the  automaton  to  state  a(Pqa). 

There  are  two  cases  to  consider.  In  the  first  case,  a  =  P[q  +  1],  so  that  the 
character  a  continues  to  match  the  pattern;  in  this  case,  because  8(q,a)  —  q  + 1,  the 
transition  continues  to  go  along  the  “spine”  of  the  automaton  (the  heavy  edges  in 
Figure  32.7).  In  the  second  case,  a  ^  P  [q  + 1],  so  that  a  does  not  continue  to  match 
the  pattern.  Here,  we  must  find  a  smaller  prefix  of  P  that  is  also  a  suffix  of  7)  . 
Because  the  preprocessing  step  matches  the  pattern  against  itself  when  creating  the 
string-matching  automaton,  the  transition  function  quickly  identifies  the  longest 
such  smaller  prefix  of  P . 

Let’s  look  at  an  example.  The  string-matching  automaton  of  Figure  32.7  has 
5(5,  c)  =  6,  illustrating  the  first  case,  in  which  the  match  continues.  To  illus¬ 
trate  the  second  case,  observe  that  the  automaton  of  Figure  32.7  has  8(5,  b)  =  4. 
We  make  this  transition  because  if  the  automaton  reads  a  b  in  state  q  =  5,  then 
Pqb  =  ababab,  and  the  longest  prefix  of  P  that  is  also  a  suffix  of  ababab  is 
7*4  =  abab. 
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Figure  32.8  An  illustration  for  the  proof  of  Lemma  32.2.  The  figure  shows  that  r  <  o(x)  +  1, 
where  r  =  o(xa). 

To  clarify  the  operation  of  a  string-matching  automaton,  we  now  give  a  simple, 
efficient  program  for  simulating  the  behavior  of  such  an  automaton  (represented 
by  its  transition  function  8)  in  finding  occurrences  of  a  pattern  P  of  length  m  in  an 
input  text  T  [  1  As  for  any  string -matching  automaton  for  a  pattern  of  length  m , 

the  state  set  Q  is  {0, 1, ... ,  m },  the  start  state  is  0,  and  the  only  accepting  state  is 
state  m. 

Finite- Automaton-Matcher  (7,  8 ,  m ) 

1  n  =  T.  length 

2  q  —  0 

3  for  i  =  1  to  n 

4  q  =  8(q,T[i ]) 

5  if  q  ==  m 

6  print  “Pattern  occurs  with  shift”  i  —  m 

From  the  simple  loop  structure  of  Finite- Automaton-Matcher,  we  can  easily 
see  that  its  matching  time  on  a  text  string  of  length  n  is  0(«).  This  matching 
time,  however,  does  not  include  the  preprocessing  time  required  to  compute  the 
transition  function  8.  We  address  this  problem  later,  after  first  proving  that  the 
procedure  Finite-Automaton-Matcher  operates  correctly. 

Consider  how  the  automaton  operates  on  an  input  text  T[\  .  ,n\.  We  shall  prove 
that  the  automaton  is  in  state  <7(7})  after  scanning  character  T[i].  Since  a (7})  =  m 
if  and  only  if  P  □  7} ,  the  machine  is  in  the  accepting  state  m  if  and  only  if  it  has 
just  scanned  the  pattern  P .  To  prove  this  result,  we  make  use  of  the  following  two 
lemmas  about  the  suffix  function  a . 

Lemma  32.2  ( Suffix-function  inequality ) 

For  any  string  x  and  character  a,  we  have  o{xa)  <  cr(x)  +  1. 

Proof  Referring  to  Figure  32.8,  let  r  =  o(xa).  If  r  =  0,  then  the  conclusion 
a(xa)  =  r  <  o{x)  +  1  is  trivially  satisfied,  by  the  nonnegativity  of  a(x).  Now 
assume  that  r  >  0.  Then,  Pr  □  xa,  by  the  definition  of  a.  Thus,  Pr^\  □  x,  by 
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Figure  32.9  An  illustration  for  the  proof  of  Lemma  32.3.  The  figure  shows  that  r  =  a(Pqa), 
where  q  =  a(x)  and  r  =  ct(xa). 

dropping  the  a  from  the  end  of  Pr  and  from  the  end  of  xa.  Therefore,  r-1  <  cr(x), 
since  a(x)  is  the  largest  k  such  that  Pk  □  x,  and  thus  a(xa)  —  r  <  a(x)  +1.  ■ 

Lemma  32.3  ( Suffix-function  recursion  lemma ) 

For  any  string  a  and  character  a,  if  q  =  o(x),  then  cr(xa)  =  o(Pqa). 

Proof  From  the  definition  of  a,  we  have  Pq  □  x.  As  Figure  32.9  shows,  we 
also  have  Pqa  □  xa.  If  we  let  r  =  o(xa),  then  Pr  □  xa  and,  by  Lemma  32.2, 
r  <  q  +  1.  Thus,  we  have  \Pr\  =  r  <  q  +  1  =  \Pqa\.  Since  Pqa  □  xa,  Pr  □  xa, 
and  | P j. |  <'-  |  P q a | ,  Lemma  32.1  implies  that  Py  2]  Pqa.  Therefore,  r  ^  o(Pqa), 
that  is,  o(xa)  <  o(Pqa).  But  we  also  have  o(Pqa)  <  o(xa),  since  Pqa  □  xa. 
Thus,  o(xa)  =  <j(Pqa).  m 

We  are  now  ready  to  prove  our  main  theorem  characterizing  the  behavior  of  a 
string-matching  automaton  on  a  given  input  text.  As  noted  above,  this  theorem 
shows  that  the  automaton  is  merely  keeping  track,  at  each  step,  of  the  longest 
prefix  of  the  pattern  that  is  a  suffix  of  what  has  been  read  so  far.  In  other  words, 
the  automaton  maintains  the  invariant  (32.5). 

Theorem  32.4 

If  f  is  the  final-state  function  of  a  string-matching  automaton  for  a  given  pattern  P 
and  T[\  . .  n]  is  an  input  text  for  the  automaton,  then 

4>(Ti)  =  cr(Ti) 

for  /  =  0, 1 . 

Proof  The  proof  is  by  induction  on  i.  For  i  =  0,  the  theorem  is  trivially  true, 
since  T0  =  e.  Thus,  f(To)  =  0  =  o(T0). 
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Now,  we  assume  that  0(7})  =  a ( 7, )  and  prove  that  0(7}+1)  =  cr(7/+1).  Let  q 
denote  0(7,),  and  let  a  denote  7[/  +  1],  Then, 


0(7/+ 1) 


0(7/0 

5(0(7,),  a) 

8{q,a) 

o(Pqa) 

cr(7,fl) 

o'(7/+1) 


(by  the  definitions  of  7}+1  and  a) 

(by  the  definition  of  0) 

(by  the  definition  of  q) 

(by  the  definition  (32.4)  of  8) 

(by  Lemma  32.3  and  induction) 

(by  the  definition  of  7}+1)  .  ■ 


By  Theorem  32.4,  if  the  machine  enters  state  q  on  line  4,  then  q  is  the  largest 
value  such  that  Pq  □  7, .  Thus,  we  have  q  =  in  on  line  5  if  and  only  if  the  ma¬ 
chine  has  just  scanned  an  occurrence  of  the  pattern  P .  We  conclude  that  Finite- 
Automaton-Matcher  operates  correctly. 

Computing  the  transition  function 

The  following  procedure  computes  the  transition  function  8  from  a  given  pattern 
7[1  . . m\. 

Compute-Transition-Function  (7,  £) 

1  m  =  P.  length 

2  for  q  =  0  to  m 

3  for  each  character  a  €  X 

4  k  =  min(m  +  1,^  +  2) 

5  repeat 

6  7=7—1 

7  until  Pk  □  Pqa 

8  8(q.a)  —  7 

9  return  8 

This  procedure  computes  8  (q,  a  )  in  a  straightforward  manner  according  to  its  def¬ 
inition  in  equation  (32.4).  The  nested  loops  beginning  on  lines  2  and  3  consider 
all  states  q  and  all  characters  a ,  and  lines  4-8  set  8(q,a)  to  be  the  largest  7  such 
that  Pk  □  Pqa.  The  code  stalls  with  the  largest  conceivable  value  of  7,  which  is 
min(m,</  +  1).  It  then  decreases  7  until  Pg  □  Pqa,  which  must  eventually  occur, 
since  P0  =  e  is  a  suffix  of  every  string. 

The  running  time  of  Compute-Transition-Function  is  (9(m3|X|),  be¬ 
cause  the  outer  loops  contribute  a  factor  of  m  \  X  | ,  the  inner  repeat  loop  can  run 
at  most  m  +  1  times,  and  the  test  Pg  □  Pqa  on  line  7  can  require  comparing  up 
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to  m  characters.  Much  faster  procedures  exist;  by  utilizing  some  cleverly  com¬ 
puted  information  about  the  pattern  P  (see  Exercise  32.4-8),  we  can  improve  the 
time  required  to  compute  <5  from  P  to  0(m  |E|).  With  this  improved  procedure  for 
computing  <5,  we  can  find  all  occurrences  of  a  length-m  pattern  in  a  length-/?  text 
over  an  alphabet  S  with  0(m  |E|)  preprocessing  time  and  ©(«)  matching  time. 

Exercises 


32.3-1 

Construct  the  string-matching  automaton  for  the  pattern  P  =  aabab  and  illustrate 
its  operation  on  the  text  string  T  =  aaababaabaababaab. 


32.3- 2 

Draw  a  state-transition  diagram  for  a  string-matching  automaton  for  the  pattern 
ababbabbababbababbabb  over  the  alphabet  E  =  {a.b}. 

32.3- 3 

We  call  a  pattern  P  nonoverlappable  if  Pg  □  Pq  implies  k  =  0  or  k  =  q.  De¬ 
scribe  the  state-transition  diagram  of  the  string-matching  automaton  for  a  nonover¬ 
lappable  pattern. 

32.3- 4  * 

Given  two  patterns  P  and  P',  describe  how  to  construct  a  finite  automaton  that 
determines  all  occurrences  of  either  pattern.  Try  to  minimize  the  number  of  states 
in  your  automaton. 


32.3-5 

Given  a  pattern  P  containing  gap  characters  (see  Exercise  32.1-4),  show  how  to 
build  a  finite  automaton  that  can  find  an  occurrence  of  P  in  a  text  T  in  0(n) 
matching  time,  where  n  =  1 7|. 


★  32.4  The  Knuth-Morris-Pratt  algorithm 

We  now  present  a  linear-time  string-matching  algorithm  due  to  Knuth,  Moms,  and 
Pratt.  This  algorithm  avoids  computing  the  transition  function  8  altogether,  and  its 
matching  time  is  ©(/?)  using  just  an  auxiliary  function  n,  which  we  precompute 
from  the  pattern  in  time  0(/n)  and  store  in  an  array  jr  [  I  . .  m\.  The  array  n  allows 
us  to  compute  the  transition  function  8  efficiently  (in  an  amortized  sense)  “on  the 
fly”  as  needed.  Loosely  speaking,  for  any  state  q  =  0, 1 ,...  ,m  and  any  character 
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a  e  E,  the  value  n[q]  contains  the  information  we  need  to  compute  8(q,a)  but 
that  does  not  depend  on  a.  Since  the  array  n  has  only  in  entries,  whereas  5  has 
0 (m  |  E  | )  entries,  we  save  a  factor  of  |  E  |  in  the  preprocessing  time  by  computing  n 
rather  than  8. 

The  prefix  function  for  a  pattern 

The  prefix  function  n  for  a  pattern  encapsulates  knowledge  about  how  the  pat¬ 
tern  matches  against  shifts  of  itself.  We  can  take  advantage  of  this  information  to 
avoid  testing  useless  shifts  in  the  naive  pattern-matching  algorithm  and  to  avoid 
precomputing  the  full  transition  function  8  for  a  string-matching  automaton. 

Consider  the  operation  of  the  naive  string  matcher.  Figure  32.10(a)  shows  a 
particular-  shift  s  of  a  template  containing  the  pattern  P  =  ababaca  against  a 
text  T .  For  this  example,  q  =  5  of  the  characters  have  matched  successfully,  but 
the  6th  pattern  character  fails  to  match  the  corresponding  text  character.  The  infor¬ 
mation  that  q  characters  have  matched  successfully  determines  the  corresponding 
text  characters.  Knowing  these  q  text  characters  allows  us  to  determine  immedi¬ 
ately  that  certain  shifts  are  invalid.  In  the  example  of  the  figure,  the  shift  s  +  1  is 
necessarily  invalid,  since  the  first  pattern  character  (a)  would  be  aligned  with  a  text 
character  that  we  know  does  not  match  the  first  pattern  character,  but  does  match 
the  second  pattern  character  (b).  The  shift  s'  =  s  +  2  shown  in  part  (b)  of  the  fig¬ 
ure,  however,  aligns  the  first  three  pattern  characters  with  three  text  characters  that 
must  necessarily  match.  In  general,  it  is  useful  to  know  the  answer  to  the  following 
question: 

Given  that  pattern  characters  P  [1  . .  q]  match  text  characters  T  [5  +  I  . .  s+q\, 
what  is  the  least  shift  s'  >  s  such  that  for  some  k  <  q, 

P[\..k]  =  T[s'  +  \..s'  +  k],  (32.6) 

where  s'  +  k  =  s  +  ql 

In  other  words,  knowing  that  Pq  □  Ts+q,  we  want  the  longest  proper  prefix  Pg 
of  Pq  that  is  also  a  suffix  of  Ts+q-  (Since  s'  +  k  =  5  +  q,  if  we  are  given  s 
and  q,  then  finding  the  smallest  shift  s'  is  tantamount  to  finding  the  longest  prefix 
length  k .)  We  add  the  difference  q  —  k  in  the  lengths  of  these  prefixes  of  P  to  the 
shift  s  to  arrive  at  our  new  shift  s',  so  that  s'  =  s  +  (q  —  k).  In  the  best  case,  k  =  0, 
so  that  s'  =  s  +  q,  and  we  immediately  rule  out  shifts  5  +  1,  s  +  2, . . .  ,s  +  q  —  1. 
In  any  case,  at  the  new  shift  s'  we  don’t  need  to  compare  the  first  k  characters  of  P 
with  the  corresponding  characters  of  T ,  since  equation  (32.6)  guarantees  that  they 
match. 

We  can  precompute  the  necessary  information  by  comparing  the  pattern  against 
itself,  as  Figure  32.10(c)  demonstrates.  Since  T[s'  +  1  .  .s'  +  k]  is  part  of  the 
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Figure  32.10  The  prefix  function  jr.  (a)  The  pattern  P  =  ababaca  aligns  with  a  text  T  so  that 
the  first  q  =  5  characters  match.  Matching  characters,  shown  shaded,  are  connected  by  vertical  lines, 
(b)  Using  only  our  knowledge  of  the  5  matched  characters,  we  can  deduce  that  a  shift  of  s  +  1  is 
invalid,  but  that  a  shift  of  s'  =  s+2  is  consistent  with  everything  we  know  about  the  text  and  therefore 
is  potentially  valid,  (c)  We  can  precompute  useful  information  for  such  deductions  by  comparing  the 
pattern  with  itself.  Here,  we  see  that  the  longest  prefix  of  P  that  is  also  a  proper  suffix  of  P5  is  P3. 
We  represent  this  precomputed  information  in  the  array  n,  so  that  tt  [5]  =  3.  Given  that  q  characters 
have  matched  successfully  at  shift  5,  the  next  potentially  valid  shift  is  at  s'  =  s  +  (q  —  tt[q])  as  shown 
in  part  (b). 


known  portion  of  the  text,  it  is  a  suffix  of  the  string  Pq.  Therefore,  we  can  interpret 
equation  (32.6)  as  asking  for  the  greatest  k  <  q  such  that  I\  □  Pq.  Then,  the  new 
shift  s'  =  s  +  (q  —  k)  is  the  next  potentially  valid  shift.  We  will  find  it  convenient  to 
store,  for  each  value  of  q,  the  number  k  of  matching  characters  at  the  new  shift  s', 
rather  than  storing,  say,  s'  —  s. 

We  formalize  the  information  that  we  precompute  as  follows.  Given  a  pattern 
P  [  I  .  .in],  the  prefix  function  for  the  pattern  P  is  the  function  n  :  { 1 , 2 , . . . ,  m  }  — >• 
{0,1 ,...  ,m  —  1}  such  that 

n[q\  =  max  {A:  :  k  <  q  and  Pg  □  Pq }  . 

That  is,  tt [q\  is  the  length  of  the  longest  prefix  of  P  that  is  a  proper  suffix  of  Pq. 
Figure  32.11(a)  gives  the  complete  prefix  function  n  for  the  pattern  ababaca. 
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tt[5]  =  3 

a  |  b  a  b  a  c  a  7r  [3]  =  1 

a  b  a  b  a  c  a  n[  1]  =  0 


Figure  32.11  An  illustration  of  Lemma  32.5  for  the  pattern  P  =  ababaca  and  q  =  5.  (a)  The  n 
function  for  the  given  pattern.  Since  7r[5]  =  3,  jr[3]  =  1,  and  jr[l]  =  0,  by  iterating  n  we  obtain 
it*  [5]  =  {3,  1.0}.  (b)  We  slide  the  template  containing  the  pattern  P  to  the  right  and  note  when  some 
prefix  Pk  of  P  matches  up  with  some  proper  suffix  of  P5  ;  we  get  matches  when  k  =  3,  1,  and  0.  In 
the  figure,  the  first  row  gives  P,  and  the  dotted  vertical  line  is  drawn  just  after  P5.  Successive  rows 
show  all  the  shifts  of  P  that  cause  some  prefix  P ^  of  P  to  match  some  suffix  of  P5.  Successfully 
matched  characters  are  shown  shaded.  Vertical  lines  connect  aligned  matching  characters.  Thus, 
{k  :  k  <  5  and  P^  □  P5 }  =  {3,  1,0}.  Lemma  32.5  claims  that  jt*  [q\  =  {k  :  k  <  q  and  P^  □  Pq} 
for  all  q. 


The  pseudocode  below  gives  the  Knuth-Morris-Pratt  matching  algorithm  as 
the  procedure  KMP-Matcher.  For  the  most  part,  the  procedure  follows  from 
Finite-Automaton-Matcher,  as  we  shall  see.  KMP-Matcher  calls  the  aux¬ 
iliary  procedure  Compute-Prefix-Function  to  compute  n. 


KMP-Matcher (T,  P) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 


n  =  T. length 
m  =  P.  length 

n  =  Compute-Prefix-Function  (P) 


q  =  0  //  number  of  characters  matched 

for  i  =  1  to  n  //  scan  the  text  from  left  to  right 

while  q  >  0  and  P[q  +  1]  ^  T[i\ 

q  =  n[q]  // next  character  does  not  match 

if  P[q+  1]  ==T[i] 

q  =  q  +  1  //  next  character  matches 

if  q  ==  m  II  is  all  of  P  matched? 

print  “Pattern  occurs  with  shift”  i  —  m 
q  =  7i  [q\  II  look  for  the  next  match 
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Compute-Prefix-Function  (P ) 

1  m  =  P.  length 

2  let  tc  [1  . .  m\  be  a  new  aiTay 

3  7t[1]  =  0 

4  k  =  0 

5  for  q  =  2  to  m 

6  while  k  >  0  and  P  [k  +  1]  ^  P  [ q ] 

7  k  =  n[k] 

8  SP[k  +  l]==P[q] 

9  k  =  k  +  1 

10  n[q\  =  k 

1 1  return  n 

These  two  procedures  have  much  in  common,  because  both  match  a  string  against 
the  pattern  P:  KMP-Matcher  matches  the  text  T  against  P,  and  COMPUTE  - 
Prefix-Function  matches  P  against  itself. 

We  begin  with  an  analysis  of  the  running  times  of  these  procedures.  Proving 
these  procedures  correct  will  be  more  complicated. 

Running-time  analysis 

The  running  time  of  Compute-Prefix-Function  is  0(/n),  which  we  show  by 
using  the  aggregate  method  of  amortized  analysis  (see  Section  17.1).  The  only 
tricky  pail  is  showing  that  the  while  loop  of  lines  6-7  executes  0(m)  times  alto¬ 
gether.  We  shall  show  that  it  makes  at  most  m  —  1  iterations.  We  start  by  making 
some  observations  about  k.  First,  line  4  stalls  k  at  0,  and  the  only  way  that  k 
increases  is  by  the  increment  operation  in  line  9,  which  executes  at  most  once  per 
iteration  of  the  for  loop  of  lines  5-10.  Thus,  the  total  increase  in  k  is  at  most  m  —  1. 
Second,  since  k  <  q  upon  entering  the  for  loop  and  each  iteration  of  the  loop  in¬ 
crements  q,  we  always  have  k  <  q.  Therefore,  the  assignments  in  lines  3  and  10 
ensure  that  n[q\  <  q  for  all  q  =  1,2,...,  m,  which  means  that  each  iteration  of 
the  while  loop  decreases  k.  Third,  k  never  becomes  negative.  Putting  these  facts 
together,  we  see  that  the  total  decrease  in  k  from  the  while  loop  is  bounded  from 
above  by  the  total  increase  in  k  over  all  iterations  of  the  for  loop,  which  is  m  —  1. 
Thus,  the  while  loop  iterates  at  most  m  —  1  times  in  all,  and  Compute-Prefix- 
Function  runs  in  time  0(/n). 

Exercise  32.4-4  asks  you  to  show,  by  a  similar  aggregate  analysis,  that  the  match¬ 
ing  time  of  KMP-Matcher  is  0(n). 

Compared  with  Finite- Automaton-Matcher,  by  using  n  rather  than  8,  we 
have  reduced  the  time  for  preprocessing  the  pattern  from  0(m  \  £|)  to  0(/«),  while 
keeping  the  actual  matching  time  bounded  by  0(»). 
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Correctness  of  the  prefix-function  computation 

We  shall  see  a  little  later  that  the  prefix  function  n  helps  us  simulate  the  transition 
function  8  in  a  string-matching  automaton.  But  first,  we  need  to  prove  that  the 
procedure  Compute-Prefix-Function  does  indeed  compute  the  prefix  func¬ 
tion  correctly.  In  order  to  do  so,  we  will  need  to  find  all  prefixes  Pk  that  are  proper 
suffixes  of  a  given  prefix  Pq.  The  value  of  n[q\  gives  us  the  longest  such  prefix,  but 
the  following  lemma,  illustrated  in  Figure  32.11,  shows  that  by  iterating  the  prefix 
function  n,  we  can  indeed  enumerate  all  the  prefixes  Pk  that  are  proper  suffixes 
of  Pq.  Let 

n*[q\  =  {n[q],n^2)[q],7Ti3)[q] . tt(%]}  , 

where  n^[q]  is  defined  in  terms  of  functional  iteration,  so  that  n{<))[q\  =  q  and 
n^[q\  =  Tt[n <«'-%]]  for  i  >  1,  and  where  the  sequence  in  tz *[q]  stops  upon 
reaching  n (t)[q\  =  0. 

Lemma  32.5  ( Prefix-function  iteration  lemma) 

Let  P  be  a  pattern  of  length  m  with  prefix  function  n.  Then,  for  q  =  1,2,...,  m, 
we  have  n*[q]  =  {k  :  k  <  q  and  Pk  □  Pq}. 

Proof  We  first  prove  that  n*[q\  C  \k  :  k  <  q  and  Pk  □  Pq)  or,  equivalently, 

i  e  n*[q ]  implies  Pj  □  Pq  .  (32.7) 

If  i  e  n*[q\,  then  i  =  n<u^[q]  for  some  u  >  0.  We  prove  equation  (32.7)  by 
induction  on  u.  For  u  =  1,  we  have  i  =  n[q],  and  the  claim  follows  since  i  <  q 
and  Pn[q)  □  Pq  by  the  definition  of  n .  Using  the  relations  n[i]  <  i  and  Pji\,\  □  P, 
and  the  transitivity  of  <  and  □  establishes  the  claim  for  all  i  in  n*[q\.  Therefore, 
n*[q\  ^  {k  :  k  <  q  and  Pk  □  Pq}. 

We  now  prove  that  {k  :  k  <  q  and  Pk  □  Pq}  C  n*[q\  by  contradiction.  Sup¬ 
pose  to  the  contrary  that  the  set  {k  :  k  <  q  and  Pk  □  Pq)  —  n*[q\  is  nonempty, 
and  let  j  be  the  largest  number  in  the  set.  Because  n[q\  is  the  largest  value  in 
{k  :  k  <  q  and  Pk  □  Pq}  and  n[q]  e  jr *[<?],  we  must  have  j  <  n[q\,  and  so  we 
let  j'  denote  the  smallest  integer  in  n*[q]  that  is  greater  than  j .  (We  can  choose 
j'  =  jz  [q]  if  no  other  number  in  nr*  [(/]  is  greater  than  j .)  We  have  Pj  □  Pq  because 
j  €  {k  :  k  <  q  and  Pk  □  Pq),  and  from  j'  €  n*  [q]  and  equation  (32.7),  we  have 
Pj'  □  Pq.  Thus,  Pj  □  Pj'  by  Lemma  32.1,  and  j  is  the  largest  value  less  than  j1 
with  this  property.  Therefore,  we  must  have  rc[j'\  =  j  and,  since  j'  e  n*[q\,  we 
must  have  j  e  n*[q]  as  well.  This  contradiction  proves  the  lemma.  ■ 

The  algorithm  Compute-Preftx-Function  computes  n[q\,  in  order,  for  q  = 
1,2, ...  ,m.  Setting  n[  1]  to  0  in  line  3  of  Compute-Prefix-Function  is  cer¬ 
tainly  correct,  since  n[q\  <  q  for  all  q.  We  shall  use  the  following  lemma  and 
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its  corollary  to  prove  that  Compute-Preftx-Function  computes  n[q]  correctly 
for  q  >  1 . 

Lemma  32.6 

Let  P  be  a  pattern  of  length  m,  and  let  n  be  the  prefix  function  for  P .  For  q  = 

1.2. . .. , m,  if  7t[q]  >  0,  then  n[q\  —  1  €  n*[q  —  1]. 

Proof  Let  r  =  n[q]  >  0,  so  that  r  <  q  and  Pr  □  Pq;  thus,  r  —  1  <  q  —  1  and 
Pr_!  □  Pq-\  (by  dropping  the  last  character  from  Pr  and  Pq,  which  we  can  do 
because  r  >  0).  By  Lemma  32.5,  therefore,  r  —  1  e  n*[q  —  1],  Thus,  we  have 
n[q]  —  1  =  r  —  1  €  n*[q  —  1].  ■ 

For  q  =  2,3, ...  ,m,  define  the  subset  Eq- \  C  n*[q  —  1]  by 
Eq-X  ={k€  n*[q  -  1]  :  P[k  +  1]  =  P[q]} 

=  {k  :  k  <  q  —  1  and  □  Pq-i  and  P[k  +  1]  =  P[q]}  (by  Lemma  32.5) 

=  {k  :  k  <  q  —  1  and  f\+1  □  Pq }  . 

The  set  Eq_x  consists  of  the  values  k  <  q  —  1  for  which  I\  □  Pq-\  and  for  which, 
because  P[k  +  1]  =  P[q],  we  have  Pp+ 1  □  Pq.  Thus,  Eq_i  consists  of  those 

values  k  e  n*[q  —  1]  such  that  we  can  extend  Pg  to  Pk+ 1  and  get  a  proper  suffix 

of  Pq. 

Corollary  32.7 

Let  P  be  a  pattern  of  length  m,  and  let  n  be  the  prefix  function  for  P .  For  q  — 

2.3.. .. , m, 

„w  =  l 0  if£-  =  0' 

I  1  +  max{/:  e  Eq_ i}  if  Eq_x  0  . 

Proof  If  Eg- 1  is  empty,  there  is  no  k  e  n *[q  —  1]  (including  k  =  0)  for  which 
we  can  extend  to  Pk+ 1  and  get  a  proper  suffix  of  Pq.  Therefore  n  [q]  =  0. 

If  Eq- 1  is  nonempty,  then  for  each  k  6  Eq-i  we  have  k  +  1  <  q  and  Pk  +  l  □  Pg- 
Therefore,  from  the  definition  of  n[q],  we  have 

n[q\>  1  +  max{^  6  Eq- 1}  .  (32.8) 

Note  that  n[q]  >  0.  Let  r  —  n[q\  —  1,  so  that  r  +  1  =  n[q]  and  there¬ 
fore  Pr  + 1  □  Pq.  Since  r  +  1  >  0,  we  have  P[r  +  1]  =  P[q\.  Furthermore, 
by  Lemma  32.6,  we  have  r  e  n*[q  —  1],  Therefore,  r  e  Eq- 1,  and  so  r  < 
max  {k  e  Eq_  1 1  or,  equivalently, 

<  1  +  max{/:  €  ^-1}  • 

Combining  equations  (32.8)  and  (32.9)  completes  the  proof. 
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We  now  finish  the  proof  that  Compute-Prefix-Function  computes  ji  cor¬ 
rectly.  In  the  procedure  Compute-Prefix-Function,  at  the  start  of  each  iter¬ 
ation  of  the  for  loop  of  lines  5-10,  we  have  that  k  —  Ji[q  —  1],  This  condition 
is  enforced  by  lines  3  and  4  when  the  loop  is  first  entered,  and  it  remains  true  in 
each  successive  iteration  because  of  line  10.  Lines  6-9  adjust  k  so  that  it  becomes 
the  correct  value  of  n [q\.  The  while  loop  of  lines  6-7  searches  through  all  values 
k  €  ji*  [q  —  1]  until  it  finds  a  value  of  k  for  which  P[k  +  1]  =  P[q\,  at  that  point, 
k  is  the  largest  value  in  the  set  Eq-\,  so  that,  by  Corollary  32.7,  we  can  set  n[q] 
to  k  +  1.  If  the  while  loop  cannot  find  al  e  jr*[g-l]  such  that  P  [k  +  1]  =  P[q], 
then  k  equals  0  at  line  8.  If  P  [1]  =  P  [q\,  then  we  should  set  both  k  and  n[q\  to  1; 
otherwise  we  should  leave  k  alone  and  set  n  [q\  to  0.  Lines  8-10  set  k  and  n[q\ 
correctly  in  either  case.  This  completes  our  proof  of  the  correctness  of  COMPUTE- 
Prefix-Function. 

Correctness  of  the  Knuth-Morris-Pratt  algorithm 

We  can  think  of  the  procedure  KMP-Matcher  as  a  reimplemented  version  of 
the  procedure  Finite- Automaton-Matcher,  but  using  the  prefix  function  n 
to  compute  state  transitions.  Specifically,  we  shall  prove  that  in  the  i  th  iteration  of 
the  for  loops  of  both  KMP-Matcher  and  Finite- Automaton-Matcher,  the 
state  q  has  the  same  value  when  we  test  for  equality  with  m  (at  line  10  in  KMP- 
Matcher  and  at  line  5  in  Finite-Automaton-Matcher).  Once  we  have 
argued  that  KMP-Matcher  simulates  the  behavior  of  Finite-Automaton- 
Matcher,  the  correctness  of  KMP-Matcher  follows  from  the  correctness  of 
Finite- Automaton-Matcher  (though  we  shall  see  a  little  later  why  line  12  in 
KMP-Matcher  is  necessary). 

Before  we  formally  prove  that  KMP-Matcher  correctly  simulates  Finite- 
Automaton-Matcher,  let’s  take  a  moment  to  understand  how  the  prefix  func¬ 
tion  n  replaces  the  8  transition  function.  Recall  that  when  a  string-matching 
automaton  is  in  state  q  and  it  scans  a  character  a  =  T[i],  it  moves  to  a  new 
state  8(q.a).  If  a  =  P[q  +  1],  so  that  a  continues  to  match  the  pattern,  then 
8(q.a)  =  q  +  1.  Otherwise,  a  ^  P[q  +  1],  so  that  a  does  not  continue  to  match 
the  pattern,  and  0  <  8(q.a)  <  q.  In  the  first  case,  when  a  continues  to  match, 
KMP-Matcher  moves  to  state  q  +  1  without  referring  to  the  n  function:  the 
while  loop  test  in  line  6  comes  up  false  the  first  time,  the  test  in  line  8  comes  up 
true,  and  line  9  increments  q. 

The  n  function  comes  into  play  when  the  character  a  does  not  continue  to  match 
the  pattern,  so  that  the  new  state  8(q,a)  is  either  q  or  to  the  left  of  q  along  the  spine 
of  the  automaton.  The  while  loop  of  lines  6-7  in  KMP-Matcher  iterates  through 
the  states  in  n*[q],  stopping  either  when  it  arrives  in  a  state,  say  q',  such  that  a 
matches  P[q'  +  1]  or  q'  has  gone  all  the  way  down  to  0.  If  a  matches  P[q'  +  1], 
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then  line  9  sets  the  new  state  to  q'  + 1,  which  should  equal  8(q,a)  for  the  simulation 
to  work  correctly.  In  other  words,  the  new  state  8(q,a)  should  be  either  state  0  or 
one  greater  than  some  state  in  n*[q]. 

Let’s  look  at  the  example  in  Figures  32.7  and  32.11,  which  are  for  the  pattern 
P  =  ababaca.  Suppose  that  the  automaton  is  in  state  q  =  5;  the  states  in 
?r*[5]  are,  in  descending  order,  3,  1,  and  0.  If  the  next  character  scanned  is  c,  then 
we  can  easily  see  that  the  automaton  moves  to  state  5(5,  c)  =  6  in  both  Finite- 
Automaton-Matcher  and  KMP-Matcher.  Now  suppose  that  the  next  char¬ 
acter  scanned  is  instead  b,  so  that  the  automaton  should  move  to  state  5(5,  b)  =  4. 
The  while  loop  in  KMP-Matcher  exits  having  executed  line  7  once,  and  it  ar¬ 
rives  in  state  qf  =  7r[5]  =  3.  Since  P[q'  +  1]  =  P[4]  =  b,  the  test  in  line  8 
comes  up  true,  and  KMP-Matcher  moves  to  the  new  state  q'  +  1  =  4  =  5(5,  b). 
Finally,  suppose  that  the  next  character  scanned  is  instead  a,  so  that  the  automa¬ 
ton  should  move  to  state  5(5,  a)  =  1.  The  first  three  times  that  the  test  in  line  6 
executes,  the  test  comes  up  true.  The  first  time,  we  find  that  P[6]  =  c  /  a,  and 
KMP-Matcher  moves  to  state  n[5]  =  3  (the  first  state  in  tt*[5]).  The  second 
time,  we  find  that  P[4]  =  b  /  a  and  move  to  state  tt[3]  =  1  (the  second  state 
in  n*[5\).  The  third  time,  we  find  that  P[2]  =  b  ^  a  and  move  to  state  n\\]  —  0 
(the  last  state  in  tt*[5]).  The  while  loop  exits  once  it  arrives  in  state  q'  =  0.  Now, 
line  8  finds  that  P  [q'  +  1]  =  P  [1]  =  a,  and  line  9  moves  the  automaton  to  the  new 
state  q'  +  1  =  1  =  5(5,  a). 

Thus,  our  intuition  is  that  KMP-Matcher  iterates  through  the  states  in  n*[q]  in 
decreasing  order,  stopping  at  some  state  q'  and  then  possibly  moving  to  state  q'  +  1 . 
Although  that  might  seem  like  a  lot  of  work  just  to  simulate  computing  8(q,a), 
bear  in  mind  that  asymptotically,  KMP-Matcher  is  no  slower  than  Finite- 
Automaton-Matcher. 

We  are  now  ready  to  formally  prove  the  correctness  of  the  Knuth-Morris-Pratt 
algorithm.  By  Theorem  32.4,  we  have  that  q  =  a  ( 7) )  after  each  time  we  execute 
line  4  of  Finite- Automaton-Matcher.  Therefore,  it  suffices  to  show  that  the 
same  property  holds  with  regard  to  the  for  loop  in  KMP-Matcher.  The  proof 
proceeds  by  induction  on  the  number  of  loop  iterations.  Initially,  both  procedures 
set  q  to  0  as  they  enter  their  respective  for  loops  for  the  first  time.  Consider  itera¬ 
tion  i  of  the  for  loop  in  KMP-Matcher,  and  let  q'  be  state  at  the  start  of  this  loop 
iteration.  By  the  inductive  hypothesis,  we  have  q'  =  rr  ( 7j _  i ) .  We  need  to  show 
that  q  =  o{Tj)  at  line  10.  (Again,  we  shall  handle  line  12  separately.) 

When  we  consider  the  character  T[i],  the  longest  prefix  of  P  that  is  a  suffix  of  7) 
is  either  Pq>+\  (if  P[q'  +  1]  =  T[i\)  or  some  prefix  (not  necessarily  proper,  and 
possibly  empty)  of  Pa>.  We  consider  separately  the  three  cases  in  which  rx  ( T, )  =  0, 
cr(Ti)  =  q'  +  1,  and  0  <  a(7j)  <  q' . 
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•  If  o(Tj )  =  0,  then  P0  =  s  is  the  only  prefix  of  P  that  is  a  suffix  of  7} .  The  while 
loop  of  lines  6-7  iterates  through  the  values  in  n*[q'],  but  although  Pq  □  7}  for 
every  q  €  Jt*[q'],  the  loop  never  finds  a  q  such  that  P  [q  +  1]  =  T[i],  The  loop 
terminates  when  q  reaches  0,  and  of  course  line  9  does  not  execute.  Therefore, 
q  =  0  at  line  10,  so  that  q  =  ct(7}). 

•  If  ct(7})  =  q'  +  1,  then  P[q'  +  1]  =  T[i\,  and  the  while  loop  test  in  line  6 
fails  the  first  time  through.  Line  9  executes,  incrementing  q  so  that  afterward 
we  have  q  =  q'  +  1  =  o(Ti). 

•  If  0  <  tr(7;)  <  q’ ,  then  the  while  loop  of  lines  6-7  iterates  at  least  once, 
checking  in  decreasing  order  each  value  q  e  tr*[q'}  until  it  stops  at  some  q  <  q' . 
Thus,  Pq  is  the  longest  prefix  of  Pq>  for  which  P[q+ 1]  =  T[i],  so  that  when  the 
while  loop  terminates,  q  +  1  =  o(Pq'T[i}).  Since  q'  =  cr(7',_1),  Lemma  32.3 
implies  that  a(7)_|  T[i])  =  o(PqrT[i]).  Thus,  we  have 

9+1  =  a{Pq>T\i\) 

=  ciTi-im) 

=  o(Ti) 

when  the  while  loop  terminates.  After  line  9  increments  q,  we  have  q  =  u(Tj). 

Line  12  is  necessary  in  KMP-Matcher,  because  otherwise,  we  might  refer¬ 
ence  P[m  +  1]  on  line  6  after  finding  an  occurrence  of  P .  (The  argument  that 
q  =  ct(7}_i)  upon  the  next  execution  of  line  6  remains  valid  by  the  hint  given  in 
Exercise  32.4-8:  8(m,a )  =  8(n[m\,a)  or,  equivalently,  a(Pa)  =  o(Pn[m]a)  for 
any  a  €  E.)  The  remaining  argument  for  the  correctness  of  the  Knuth-Morris- 
Pratt  algorithm  follows  from  the  correctness  of  Finite- Automaton-Matcher, 
since  we  have  shown  that  KMP-Matcher  simulates  the  behavior  of  Finite- 
Automaton-Matcher. 

Exercises 


32.4-1 

Compute  the  prefix  function  n  for  the  pattern  ababbabbabbababbabb. 


32.4-2 

Give  an  upper  bound  on  the  size  of  n*[q\  as  a  function  of  q.  Give  an  example  to 
show  that  your  bound  is  tight. 


32.4-3 

Explain  how  to  determine  the  occurrences  of  pattern  P  in  the  text  T  by  examining 
the  tc  function  for  the  string  P  T  (the  string  of  length  m +n  that  is  the  concatenation 
of  P  and  T). 
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32.4-4 

Use  an  aggregate  analysis  to  show  that  the  running  time  of  KMP-Matcher 
is  ©(/?  ). 


32.4-5 

Use  a  potential  function  to  show  that  the  running  time  of  KMP-Matcher  is  &(n). 


32.4-6 

Show  how  to  improve  KMP-Matcher  by  replacing  the  occurrence  of  n  in  line  7 
(but  not  line  12)  by  n',  where  n’  is  defined  recursively  for  q  =  1, 2, . . . ,  m  —  1  by 
the  equation 


n'[q] 


0  if  7i[q]  =  0  , 

jr'[jr[(7]]  if  n[q]^f  0  and  P[jt[q]  +  1]  =  P  [q  +  1]  , 
n[q]  if  n[q\  ^  0  and  P[n [q]  +  1]  P[q  +  1]  ■ 


Explain  why  the  modified  algorithm  is  correct,  and  explain  in  what  sense  this 
change  constitutes  an  improvement. 


32.4- 7 

Give  a  linear-time  algorithm  to  determine  whether  a  text  T  is  a  cyclic  rotation  of 
another  string  T'.  For  example,  arc  and  car  are  cyclic  rotations  of  each  other. 

32.4- 8  * 

Give  an  0{m  |E|)-time  algorithm  for  computing  the  transition  function  8  for  the 
string-matching  automaton  corresponding  to  a  given  pattern  P .  {Hint:  Prove  that 
8{q.a)  =  8{n[q],a)  if  q  =  m  or  P[q  +  1]  ^  a.) 


Problems 


32-1  String  matching  based  on  repetition  factors 

Let  yl  denote  the  concatenation  of  string  y  with  itself  i  times.  For  example, 
(ab)3  =  ababab.  We  say  that  a  string  x  e  E*  has  repetition  factor  r  if  a  =  yr 
for  some  string  y  e  E*  and  some  r  >  0.  Let  p(x)  denote  the  largest  r  such  that  x 
has  repetition  factor  r. 

a.  Give  an  efficient  algorithm  that  takes  as  input  a  pattern  P  [1 . .  m ]  and  computes 
the  value  p{  P, )  for  /  =  1,2, ...  ,m.  What  is  the  running  time  of  your  algo¬ 
rithm? 
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b.  For  any  pattern  P  [1  . .  m],  let  p*(  P )  be  defined  as  max  |<;<m  p(  P, ).  Prove  that  if 
the  pattern  P  is  chosen  randomly  from  the  set  of  all  binary  strings  of  length  m, 
then  the  expected  value  of  p*(P)  is  0(1). 

c.  Argue  that  the  following  string-matching  algorithm  correctly  finds  all  occur¬ 
rences  of  pattern  P  in  a  text  T[1  . .  n]  in  time  0(p*(P)n  +  m): 

Repetition-Matcher (P.  T) 

1  m  =  P.  length 

2  n  =  T. length 

3  k  =  1  +p*(P) 

4  q  =  0 

5  5  =  0 

6  while  s  <  n  —  m 

7  ifT[s  +  q+l]==P[q  +  l] 

8  q  =  q  +  1 

9  if  q  ==  m 

10  print  “Pattern  occurs  with  shift”  s 

11  if  q  ==  m  or  7[s  +  q  +  1]  /  P[q  +  1] 

12  s  =  s  +  max(l,  \q/k]) 

13  q  =  0 

This  algorithm  is  due  to  Galil  and  Seiferas.  By  extending  these  ideas  greatly, 
they  obtained  a  linear-time  string-matching  algorithm  that  uses  only  0(1)  stor¬ 
age  beyond  what  is  required  for  P  and  T. 


Chapter  notes 

The  relation  of  string  matching  to  the  theory  of  finite  automata  is  discussed  by 
Aho,  Hopcroft,  and  Ullman  [5].  The  Knuth-Morris-Pratt  algorithm  [214]  was 
invented  independently  by  Knuth  and  Pratt  and  by  Morris;  they  published  their 
work  jointly.  Reingold,  Urban,  and  Gries  [294]  give  an  alternative  treatment  of  the 
Knuth-Morris-Pratt  algorithm.  The  Rabin-Karp  algorithm  was  proposed  by  Karp 
and  Rabin  [201].  Galil  and  Seiferas  [126]  give  an  interesting  deterministic  linear- 
time  string-matching  algorithm  that  uses  only  0(1)  space  beyond  that  required  to 
store  the  pattern  and  text. 
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Computational  geometry  is  the  branch  of  computer  science  that  studies  algorithms 
for  solving  geometric  problems.  In  modern  engineering  and  mathematics,  com¬ 
putational  geometry  has  applications  in  such  diverse  fields  as  computer  graphics, 
robotics,  VLSI  design,  computer-aided  design,  molecular  modeling,  metallurgy, 
manufacturing,  textile  layout,  forestry,  and  statistics.  The  input  to  a  computational- 
geometry  problem  is  typically  a  description  of  a  set  of  geometric  objects,  such  as 
a  set  of  points,  a  set  of  line  segments,  or  the  vertices  of  a  polygon  in  counterclock¬ 
wise  order.  The  output  is  often  a  response  to  a  query  about  the  objects,  such  as 
whether  any  of  the  lines  intersect,  or  perhaps  a  new  geometric  object,  such  as  the 
convex  hull  (smallest  enclosing  convex  polygon)  of  the  set  of  points. 

In  this  chapter,  we  look  at  a  few  computational-geometry  algorithms  in  two 
dimensions,  that  is,  in  the  plane.  We  represent  each  input  object  by  a  set  of 
points  {p1,  p2,  P3, . . .},  where  each  p,  =  (x;-,  v;)  and  x, .  v,  e  M.  For  exam¬ 
ple,  we  represent  an  77-vertex  polygon  P  by  a  sequence  ( p0 ,  plt  p2,  . . . ,  pn- 1) 
of  its  vertices  in  order  of  their  appearance  on  the  boundary  of  P .  Computational 
geometry  can  also  apply  to  three  dimensions,  and  even  higher-dimensional  spaces, 
but  such  problems  and  their  solutions  can  be  very  difficult  to  visualize.  Even  in 
two  dimensions,  however,  we  can  see  a  good  sample  of  computational-geometry 
techniques. 

Section  33.1  shows  how  to  answer  basic  questions  about  line  segments  effi¬ 
ciently  and  accurately:  whether  one  segment  is  clockwise  or  counterclockwise 
from  another  that  shares  an  endpoint,  which  way  we  turn  when  traversing  two 
adjoining  line  segments,  and  whether  two  line  segments  intersect.  Section  33.2 
presents  a  technique  called  “sweeping”  that  we  use  to  develop  an  0(n  lg/;)-time 
algorithm  for  determining  whether  a  set  of  n  line  segments  contains  any  inter¬ 
sections.  Section  33.3  gives  two  “rotational-sweep”  algorithms  that  compute  the 
convex  hull  (smallest  enclosing  convex  polygon)  of  a  set  of  n  points:  Graham’s 
scan,  which  runs  in  time  0(n  lg  77),  and  Jarvis’s  march,  which  takes  0(nh)  time, 
where  h  is  the  number  of  vertices  of  the  convex  hull.  Finally,  Section  33.4  gives 
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an  0(n  lg/7)-time  divide-and-conquer  algorithm  for  finding  the  closest  pair  of 
points  in  a  set  of  n  points  in  the  plane. 


33.1  Line-segment  properties 

Several  of  the  computational-geometry  algorithms  in  this  chapter  require  answers 
to  questions  about  the  properties  of  line  segments.  A  convex  combination  of  two 
distinct  points  p x  =  {xx,yx)  and  p2  =  (x2,  y2)  is  any  point  p3  =  (x3,_y3)  such 
that  for  some  a  in  the  range  0  <  a  <  1,  we  have  x3  =  ax\  +  (1  —  a)x2  and 
y3  =  ay  1  +  (1  —  a)y2.  We  also  write  that  p3  —  apx  +  (1  —  a)p2.  Intuitively,  p3 
is  any  point  that  is  on  the  line  passing  through  px  and  p2  and  is  on  or  between  px 
and  p2  on  the  line.  Given  two  distinct  points  px  and  p2,  the  line  segment  pxp2 
is  the  set  of  convex  combinations  of  px  and  p2.  We  call  px  and  p2  the  endpoints 
of  segment  pxp2.  Sometimes  the  ordering  of  px  and  p2  matters,  and  we  speak  of 
the  directed  segment  px p\.  If  px  is  the  origin  (0, 0),  then  we  can  treat  the  directed 
segment  pxp\  as  the  vector  p2. 

In  this  section,  we  shall  explore  the  following  questions: 

1.  Given  two  directed  segments  pap\  and  p0 p2,  is  pop\  clockwise  from  pop2 
with  respect  to  then-  common  endpoint  p0l 

2.  Given  two  line  segments  popx  and  px p2,  if  we  traverse  popx  and  then  px p2, 
do  we  make  a  left  turn  at  point  px2 

3.  Do  line  segments  px  p2  and  p3 p4  intersect? 

There  are  no  restrictions  on  the  given  points. 

We  can  answer  each  question  in  0(1)  time,  which  should  come  as  no  surprise 
since  the  input  size  of  each  question  is  0(1).  Moreover,  our  methods  use  only  ad¬ 
ditions,  subtractions,  multiplications,  and  comparisons.  We  need  neither  division 
nor  trigonometric  functions,  both  of  which  can  be  computationally  expensive  and 
prone  to  problems  with  round-off  error.  For  example,  the  “straightforward”  method 
of  determining  whether  two  segments  intersect— compute  the  line  equation  of  the 
form  y  =  mx  +  b  for  each  segment  (m  is  the  slope  and  b  is  the  y -intercept), 
find  the  point  of  intersection  of  the  lines,  and  check  whether  this  point  is  on  both 
segments— uses  division  to  find  the  point  of  intersection.  When  the  segments  are 
nearly  parallel,  this  method  is  very  sensitive  to  the  precision  of  the  division  opera¬ 
tion  on  real  computers.  The  method  in  this  section,  which  avoids  division,  is  much 
more  accurate. 
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Figure  33.1  (a)  The  cross  product  of  vectors  pi  and  pi  is  the  signed  area  of  the  parallelogram, 

(b)  The  lightly  shaded  region  contains  vectors  that  are  clockwise  from  p.  The  darkly  shaded  region 
contains  vectors  that  are  counterclockwise  from  p. 

Cross  products 

Computing  cross  products  lies  at  the  heart  of  our  line-segment  methods.  Consider 
vectors  p\  and  P2,  shown  in  Figure  33.1(a).  We  can  interpret  the  cross  product 
Pi  x  p2  as  the  signed  area  of  the  parallelogram  formed  by  the  points  (0, 0),  pu  p2, 
and  pi  +  Pi  =  (*i  +*2,  y t  +  yi)-  An  equivalent,  but  more  useful,  definition  gives 
the  cross  product  as  the  determinant  of  a  matrix:1 


=  xiy2-x2y\ 

=  -Pi  x  Pi  ■ 

If  pi  x  p2  is  positive,  then  px  is  clockwise  from  p2  with  respect  to  the  origin  (0, 0); 
if  this  cross  product  is  negative,  then  pi  is  counterclockwise  from  p2.  (See  Exer¬ 
cise  33.1-1.)  Figure  33.1(b)  shows  the  clockwise  and  counterclockwise  regions 
relative  to  a  vector  p.  A  boundary  condition  arises  if  the  cross  product  is  0;  in  this 
case,  the  vectors  are  colin ear ,  pointing  in  either  the  same  or  opposite  directions. 

To  determine  whether  a  directed  segment  pop\  is  closer  to  a  directed  seg¬ 
ment  pop2  in  a  clockwise  direction  or  in  a  counterclockwise  direction  with  respect 
to  their  common  endpoint  p0,  we  simply  translate  to  use  p0  as  the  origin.  That 
is,  we  let  px  —  p0  denote  the  vector  p\  =  (x[,  y)),  where  x\  =  —  *0  and 

y\  =  y\  —  yo,  and  we  define  p2  —  po  similarly.  We  then  compute  the  cross  product 


'Actually,  the  cross  product  is  a  three  dimensional  concept.  It  is  a  vector  that  is  perpendicular  to 
both  pi  and  p2  according  to  the  “right  hand  rule”  and  whose  magnitude  is  \xiy2  —  x2yi\.  In  this 
chapter,  however,  we  find  it  convenient  to  treat  the  cross  product  simply  as  the  value  *i  y2  —  x2yi. 
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Figure  33.2  Using  the  cross  product  to  determine  how  consecutive  line  segments  pop\  and  ~pi~P2 
turn  at  point  p\ .  We  check  whether  the  directed  segment  po p2  is  clockwise  or  counterclockwise 
relative  to  the  directed  segment  pop\.  (a)  If  counterclockwise,  the  points  make  a  left  turn,  (b)  If 
clockwise,  they  make  a  right  tum. 


(P 1  -  Po)  X  ( P2  -  Po )  =  C*1  -  *o)(j2  -  Jo)  -  C*2  -  ^o)(jl  -  Jo)  • 

If  this  cross  product  is  positive,  then  ~pop\  is  clockwise  from  P0P2',  if  negative,  it 
is  counterclockwise. 

Determining  whether  consecutive  segments  turn  left  or  right 

Our  next  question  is  whether  two  consecutive  line  segments  p0pi  and  pxp2  turn 
left  or  right  at  point  px.  Equivalently,  we  want  a  method  to  determine  which  way  a 
given  angle  /.poP\ p2  turns.  Cross  products  allow  us  to  answer  this  question  with¬ 
out  computing  the  angle.  As  Figure  33.2  shows,  we  simply  check  whether  directed 
segment  PqP>2  is  clockwise  or  counterclockwise  relative  to  directed  segment  p0 p\ . 
To  do  so,  we  compute  the  cross  product  (p2  —  po)  x  {px  —  p0 ).  If  the  sign  of 
this  cross  product  is  negative,  then  po  p2  is  counterclockwise  with  respect  to  po p\ , 
and  thus  we  make  a  left  turn  at  p\.  A  positive  cross  product  indicates  a  clockwise 
orientation  and  a  right  turn.  A  cross  product  of  0  means  that  points  p0 ,  p  j,  and  p2 
are  colinear. 

Determining  whether  two  line  segments  intersect 

To  determine  whether  two  line  segments  intersect,  we  check  whether  each  segment 
straddles  the  line  containing  the  other.  A  segment  p\p2  straddles  a  line  if  point  px 
lies  on  one  side  of  the  line  and  point  p2  lies  on  the  other  side.  A  boundary  case 
arises  if  p\  or  p2  lies  directly  on  the  line.  Two  line  segments  intersect  if  and  only 
if  either  (or  both)  of  the  following  conditions  holds: 

1 .  Each  segment  straddles  the  line  containing  the  other. 

2.  An  endpoint  of  one  segment  lies  on  the  other  segment.  (This  condition  comes 
from  the  boundary  case.) 
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The  following  procedures  implement  this  idea.  Segments-Intersect  returns 
TRUE  if  segments  pxp2  and  ~p3pl  intersect  and  FALSE  if  they  do  not.  It  calls 
the  subroutines  Direction,  which  computes  relative  orientations  using  the  cross- 
product  method  above,  and  On-Segment,  which  determines  whether  a  point 
known  to  be  col  inear  with  a  segment  lies  on  that  segment. 

Segments-Intersect  (px,  p2,  p3,p4) 

1  d\  —  Direction  (p3,  p4,  px) 

2  d2  —  Direction  (p3,  p4,  p2) 

3  d3  =  Direction^,  p2,  p3) 

4  d4  =  Direction(/?1,/?2,/>4) 

5  if  ((d\  >  0  and  d2  <  0)  or  (d\  <  0  and  d2  >  0))  and 

((d3  >  0  and  d4  <  0)  or  (d3  <  0  and  d4  >  0)) 

6  return  TRUE 

7  elseif  dx  ==  0  and  On-Segment(/?3,  p4,  px) 

8  return  TRUE 

9  elseif  d2  ==  0  and  On-Segment(/?3,  p4,  p2) 

10  return  TRUE 

11  elseif  d3  ==  0  and  On-Segment(/»i,  p2,  p3) 

12  return  TRUE 

13  elseif  d4  ==  0  and  On-Segment(/u,  Pi,  p4) 

14  return  TRUE 

15  else  return  FALSE 

Direction  (pt ,  pj ,  pk) 

1  return  (pk  -  p,)  x  (p,  -  p,) 

On-Segment  (pi,pj,pk) 

1  if  min(x, ,  Xj)  <  Xk  <  max (Xj ,  xf)  and  m i n (y, ,  yy )  <  Ft  <  max(  v, ,  yj) 

2  return  TRUE 

3  else  return  FALSE 

Segments-Intersect  works  as  follows.  Lines  1-4  compute  the  relative  ori¬ 
entation  dj  of  each  endpoint  p,  with  respect  to  the  other  segment.  If  all  the  relative 
orientations  are  nonzero,  then  we  can  easily  determine  whether  segments  px p2 
and  p3 p4  intersect,  as  follows.  Segment  pxp2  straddles  the  line  containing  seg¬ 
ment  p3  p4  if  directed  segments  p3p\  and  /;3  p2  have  opposite  orientations  relative 
to  p3p\.  In  this  case,  the  signs  of  dt  and  d2  differ.  Similarly,  p3 p4  straddles 
the  line  containing  px p2  if  the  signs  of  d3  and  d4  differ.  If  the  test  of  line  5  is 
true,  then  the  segments  straddle  each  other,  and  Segments-Intersect  returns 
TRUE.  Figure  33.3(a)  shows  this  case.  Otherwise,  the  segments  do  not  straddle 


33.1  Line  segment  properties 


1019 


(P 1  Pi)  x  0>4  Pi)  <  0  P  P 4 

/’l  fC  '  (p4  Pi)  X(P2  px)<0 


(p2  p3)  X  (p4  p3)  <  0 

(Pi  Pi)x(p2  p,)>0 
(b) 


(d) 


Figure  333  Cases  in  the  procedure  SEGMENTS  INTERSECT,  (a)  The  segments  pipi  and  pip 4 
straddle  each  other’s  lines.  Because  P3P4  straddles  the  line  containing  P1P2 ,  the  signs  of  the  cross 
products  (p 3  —  pi)  x  (p2  —  pi)  and  (p4  —  pi)  x  (p2  —  pi)  differ.  Because  pi p2  straddles  the  line 
containing  pi  p4 ,  the  signs  of  the  cross  products  (pi  —  pi)  x  (p4  —  pi)  and  ( p2  —  ps)  x  ( p4  —  pi) 
differ,  (b)  Segment  P1P4  straddles  the  line  containing  pi  P2,  but  P1P2  does  not  straddle  the  line 
containing  P1P4.  The  signs  of  the  cross  products  (p\  —  pi)  x  (p4  —  pi)  and  (p2  —  pi)  x  (p4  —  pi) 
are  the  same,  (c)  Point  pi  is  colinear  with  P1P2  and  is  between  p\  and  P2-  (d)  Point  pi  is  colinear 
with  pi  p2,  but  it  is  not  between  p\  and  p2-  The  segments  do  not  intersect. 


each  other’s  lines,  although  a  boundary  case  may  apply.  If  all  the  relative  orienta¬ 
tions  are  nonzero,  no  boundary  case  applies.  All  the  tests  against  0  in  lines  7-13 
then  fail,  and  SEGMENTS-lNTERSECT  returns  FALSE  in  line  15.  Figure  33.3(b) 
shows  this  case. 

A  boundary  case  occurs  if  any  relative  orientation  dk  is  0.  Here,  we  know  that  pk 
is  colinear  with  the  other  segment.  It  is  directly  on  the  other  segment  if  and  only 
if  it  is  between  the  endpoints  of  the  other  segment.  The  procedure  On-Segment 
returns  whether  pk  is  between  the  endpoints  of  segment  pt  pj,  which  will  be  the 
other  segment  when  called  in  lines  7-13;  the  procedure  assumes  that  pk  is  colinear 
with  segment  PiPj.  Figures  33.3(c)  and  (d)  show  cases  with  colinear  points.  In 
Figure  33.3(c),  p3  is  on  ~pipi,  and  so  Segments-Intersect  returns  true  in 
line  12.  No  endpoints  are  on  other  segments  in  Figure  33.3(d),  and  so  Segments- 
Intersect  returns  false  in  line  15. 
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Other  applications  of  cross  products 

Later  sections  of  this  chapter  introduce  additional  uses  for  cross  products.  In  Sec¬ 
tion  33.3,  we  shall  need  to  sort  a  set  of  points  according  to  their  polar  angles  with 
respect  to  a  given  origin.  As  Exercise  33.1-3  asks  you  to  show,  we  can  use  cross 
products  to  perform  the  comparisons  in  the  sorting  procedure.  In  Section  33.2,  we 
shall  use  red-black  trees  to  maintain  the  vertical  ordering  of  a  set  of  line  segments. 
Rather  than  keeping  explicit  key  values  which  we  compare  to  each  other  in  the 
red-black  tree  code,  we  shall  compute  a  cross-product  to  determine  which  of  two 
segments  that  intersect  a  given  vertical  line  is  above  the  other. 

Exercises 


33.1-1 

Prove  that  if  px  x  p2  is  positive,  then  vector  px  is  clockwise  from  vector  p2  with 
respect  to  the  origin  (0, 0)  and  that  if  this  cross  product  is  negative,  then  px  is 
counterclockwise  from  p2. 


33.1-2 

Professor  van  Pelt  proposes  that  only  the  x -dimension  needs  to  be  tested  in  line  1 
of  On-Segment.  Show  why  the  professor  is  wrong. 


33.1-3 

The  polar  angle  of  a  point  px  with  respect  to  an  origin  point  p0  is  the  angle  of  the 
vector  pi  —  p0  in  the  usual  polar  coordinate  system.  For  example,  the  polar  angle 
of  (3,  5)  with  respect  to  (2, 4)  is  the  angle  of  the  vector  (1,  1),  which  is  45  degrees 
or  tz/A  radians.  The  polar  angle  of  (3,  3)  with  respect  to  (2, 4)  is  the  angle  of  the 
vector  (1,-1),  which  is  315  degrees  or  In/ A  radians.  Write  pseudocode  to  sort  a 
sequence  (px,  p2, . . . ,  pn)  of  n  points  according  to  their  polar  angles  with  respect 
to  a  given  origin  point  p0.  Your  procedure  should  take  0(n  lg  n)  time  and  use  cross 
products  to  compare  angles. 


33.1-4 

Show  how  to  determine  in  0(n2  Ign)  time  whether  any  three  points  in  a  set  of  n 
points  are  colinear. 


33.1-5 

A  polygon  is  a  piecewise-linear,  closed  curve  in  the  plane.  That  is,  it  is  a  curve 
ending  on  itself  that  is  formed  by  a  sequence  of  straight-line  segments,  called  the 
sides  of  the  polygon.  A  point  joining  two  consecutive  sides  is  a  vertex  of  the  poly¬ 
gon.  If  the  polygon  is  simple ,  as  we  shall  generally  assume,  it  does  not  cross  itself. 
The  set  of  points  in  the  plane  enclosed  by  a  simple  polygon  forms  the  interior  of 
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the  polygon,  the  set  of  points  on  the  polygon  itself  forms  its  boundary ,  and  the  set 
of  points  surrounding  the  polygon  forms  its  exterior.  A  simple  polygon  is  convex 
if,  given  any  two  points  on  its  boundary  or  in  its  interior,  all  points  on  the  line 
segment  drawn  between  them  are  contained  in  the  polygon’s  boundary  or  interior. 
A  vertex  of  a  convex  polygon  cannot  be  expressed  as  a  convex  combination  of  any 
two  distinct  points  on  the  boundary  or  in  the  interior  of  the  polygon. 

Professor  Amundsen  proposes  the  following  method  to  determine  whether  a  se¬ 
quence  {po.  p i, ,  pn- 1)  of  n  points  forms  the  consecutive  vertices  of  a  convex 
polygon.  Output  “yes”  if  the  set  {Zp,  pi+ipi+2  :  i  =  0,  1, ...,/?  —  1},  where  sub¬ 
script  addition  is  performed  modulo  n,  does  not  contain  both  left  turns  and  right 
turns;  otherwise,  output  “no.”  Show  that  although  this  method  runs  in  linear-  time, 
it  does  not  always  produce  the  correct  answer.  Modify  the  professor’s  method  so 
that  it  always  produces  the  correct  answer  in  linear  time. 


33.1-6 

Given  a  point  p0  =  (x0,  Vo),  the  right  horizontal  ray  from  p0  is  the  set  of  points 
{pi  =  (Xi ,  y, )  :  x,  >  x0  and  y,  =  y0},  that  is,  it  is  the  set  of  points  due  right  of  p0 
along  with  p0  itself.  Show  how  to  determine  whether  a  given  right  horizontal  ray 
from  p0  intersects  a  line  segment  p\  p2  in  0(1)  time  by  reducing  the  problem  to 
that  of  determining  whether  two  line  segments  intersect. 


33.1-7 

One  way  to  determine  whether  a  point  p0  is  in  the  interior  of  a  simple,  but  not 
necessarily  convex,  polygon  P  is  to  look  at  any  ray  from  p0  and  check  that  the  ray 
intersects  the  boundary  of  P  an  odd  number  of  times  but  that  p0  itself  is  not  on 
the  boundary  of  P .  Show  how  to  compute  in  ©(«)  time  whether  a  point  p0  is  in 
the  interior  of  an  77-vertex  polygon  P .  (Hint:  Use  Exercise  33.1-6.  Make  sure  your 
algorithm  is  correct  when  the  ray  intersects  the  polygon  boundary  at  a  vertex  and 
when  the  ray  overlaps  a  side  of  the  polygon.) 


33.1-8 

Show  how  to  compute  the  area  of  an  /7-vertex  simple,  but  not  necessarily  convex, 
polygon  in  ©(«)  time.  (See  Exercise  33.1-5  for  definitions  pertaining  to  polygons.) 


33.2  Determining  whether  any  pair  of  segments  intersects 

This  section  presents  an  algorithm  for  determining  whether  any  two  line  segments 
in  a  set  of  segments  intersect.  The  algorithm  uses  a  technique  known  as  “sweep¬ 
ing,”  which  is  common  to  many  computational-geometry  algorithms.  Moreover,  as 
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the  exercises  at  the  end  of  this  section  show,  this  algorithm,  or  simple  variations  of 
it,  can  help  solve  other  computational-geometry  problems. 

The  algorithm  runs  in  0(n  lg  n)  time,  where  n  is  the  number  of  segments  we  are 
given.  It  determines  only  whether  or  not  any  intersection  exists;  it  does  not  print 
all  the  intersections.  (By  Exercise  33.2-1,  it  takes  £2(«2)  time  in  the  worst  case  to 
find  all  the  intersections  in  a  set  of  n  line  segments.) 

In  sweeping ,  an  imaginary  vertical  sweep  line  passes  through  the  given  set  of 
geometric  objects,  usually  from  left  to  right.  We  treat  the  spatial  dimension  that 
the  sweep  line  moves  across,  in  this  case  the  x -dimension,  as  a  dimension  of 
time.  Sweeping  provides  a  method  for  ordering  geometric  objects,  usually  by  plac¬ 
ing  them  into  a  dynamic  data  structure,  and  for  taking  advantage  of  relationships 
among  them.  The  line-segment-intersection  algorithm  in  this  section  considers  all 
the  line-segment  endpoints  in  left-to-right  order  and  checks  for  an  intersection  each 
time  it  encounters  an  endpoint. 

To  describe  and  prove  correct  our  algorithm  for  determining  whether  any  two 
of  n  line  segments  intersect,  we  shall  make  two  simplifying  assumptions.  First,  we 
assume  that  no  input  segment  is  vertical.  Second,  we  assume  that  no  three  input 
segments  intersect  at  a  single  point.  Exercises  33.2-8  and  33.2-9  ask  you  to  show 
that  the  algorithm  is  robust  enough  that  it  needs  only  a  slight  modification  to  work 
even  when  these  assumptions  do  not  hold.  Indeed,  removing  such  simplifying 
assumptions  and  dealing  with  boundary  conditions  often  present  the  most  difficult 
challenges  when  programming  computational-geometry  algorithms  and  proving 
their  correctness. 

Ordering  segments 

Because  we  assume  that  there  are  no  vertical  segments,  we  know  that  any  input 
segment  intersecting  a  given  vertical  sweep  line  intersects  it  at  a  single  point.  Thus, 
we  can  order  the  segments  that  intersect  a  vertical  sweep  line  according  to  the  y- 
coordinates  of  the  points  of  intersection. 

To  be  more  precise,  consider  two  segments  Si  and  s2.  We  say  that  these  segments 
are  comparable  at  x  if  the  vertical  sweep  line  with  x -coordinate  x  intersects  both  of 
them.  We  say  that  ,V|  is  above  s2  at  x,  written  Si  s2,  if  Si  and  s2  are  comparable 
at  x  and  the  intersection  of  Si  with  the  sweep  line  at  x  is  higher  than  the  intersection 
of  s2  with  the  same  sweep  line,  or  if  s \  and  s2  intersect  at  the  sweep  line.  In 
Figure  33.4(a),  for  example,  we  have  the  relationships  a  c,  a  '^t  b,  b  '^t  c, 
a  c,  and  b  >^u  c.  Segment  d  is  not  comparable  with  any  other  segment. 

For  any  given  x,  the  relation  “£= x ”  is  a  total  preorder  (see  Section  B.2)  for  all 
segments  that  intersect  the  sweep  line  at  x.  That  is,  the  relation  is  transitive,  and 
if  segments  Si  and  s2  each  intersect  the  sweep  line  at  x,  then  either  S\  ^x  s2 
or  s2  •‘>1 ,  or  both  (if  .s  i  and  s2  intersect  at  the  sweep  line).  (The  relation  is 
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Figure  33.4  The  ordering  among  line  segments  at  various  vertical  sweep  lines,  (a)  We  have  a  tpr  c, 
a  tpt  hyb  tpt  c,  a  tpt  c,  and  b  )pu  c.  Segment  d  is  comparable  with  no  other  segment  shown, 
(b)  When  segments  e  and  /  intersect,  they  reverse  their  orders:  we  have  e  f  but  /  e.  Any 
sweep  line  (such  as  z)  that  passes  through  the  shaded  region  has  e  and  /  consecutive  in  the  ordering 
given  by  the  relation  ^z- 

also  reflexive,  but  neither  symmetric  nor  antisymmetric.)  The  total  preorder  may 
differ  for  differing  values  of  x,  however,  as  segments  enter  and  leave  the  ordering. 
A  segment  enters  the  ordering  when  its  left  endpoint  is  encountered  by  the  sweep, 
and  it  leaves  the  ordering  when  its  right  endpoint  is  encountered. 

What  happens  when  the  sweep  line  passes  through  the  intersection  of  two  seg¬ 
ments?  As  Figure  33.4(b)  shows,  the  segments  reverse  their  positions  in  the  total 
preorder.  Sweep  lines  v  and  w  are  to  the  left  and  right,  respectively,  of  the  point 
of  intersection  of  segments  e  and  /,  and  we  have  e  f  and  f  e.  Note 
that  because  we  assume  that  no  three  segments  intersect  at  the  same  point,  there 
must  be  some  vertical  sweep  line  x  for  which  intersecting  segments  e  and  /  are 
consecutive  in  the  total  preorder  .  Any  sweep  line  that  passes  through  the  shaded 
region  of  Figure  33.4(b),  such  as  z,  has  e  and  /  consecutive  in  its  total  preorder. 

Moving  the  sweep  line 

Sweeping  algorithms  typically  manage  two  sets  of  data: 

1.  The  sweep-line  status  gives  the  relationships  among  the  objects  that  the  sweep 
line  intersects. 

2.  The  event-point  schedule  is  a  sequence  of  points,  called  event  points ,  which 
we  order  from  left  to  right  according  to  their  x -coordinates.  As  the  sweep 
progresses  from  left  to  right,  whenever  the  sweep  line  reaches  the  x -coordinate 
of  an  event  point,  the  sweep  halts,  processes  the  event  point,  and  then  resumes. 
Changes  to  the  sweep-line  status  occur  only  at  event  points. 

For  some  algorithms  (the  algorithm  asked  for  in  Exercise  33.2-7,  for  example), 
the  event-point  schedule  develops  dynamically  as  the  algorithm  progresses.  The  al¬ 
gorithm  at  hand,  however,  determines  all  the  event  points  before  the  sweep,  based 
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solely  on  simple  properties  of  the  input  data.  In  particular,  each  segment  endpoint 
is  an  event  point.  We  sort  the  segment  endpoints  by  increasing  x-coordinate  and 
proceed  from  left  to  right.  (If  two  or  more  endpoints  are  covertical ,  i.e.,  they  have 
the  same  x-coordinate,  we  break  the  tie  by  putting  all  the  covertical  left  endpoints 
before  the  covertical  right  endpoints.  Within  a  set  of  covertical  left  endpoints,  we 
put  those  with  lower  y  -coordinates  first,  and  we  do  the  same  within  a  set  of  cover¬ 
tical  right  endpoints.)  When  we  encounter  a  segment’s  left  endpoint,  we  insert  the 
segment  into  the  sweep-line  status,  and  we  delete  the  segment  from  the  sweep-line 
status  upon  encountering  its  right  endpoint.  Whenever  two  segments  first  become 
consecutive  in  the  total  preorder,  we  check  whether  they  intersect. 

The  sweep-line  status  is  a  total  preorder  T,  for  which  we  require  the  following 
operations: 

•  I NSiiRT (T,  s):  insert  segment  s  into  T. 

•  Delete  (T,  s ):  delete  segment  s  from  T. 

•  Above (T,  5):  return  the  segment  immediately  above  segment  s  in  T . 

•  Below(T,  s):  return  the  segment  immediately  below  segment  s  in  T . 

It  is  possible  for  segments  Si  and  s2  to  be  mutually  above  each  other  in  the  total 
preorder  T ;  this  situation  can  occur  if  .v,  and  s2  intersect  at  the  sweep  line  whose 
total  preorder  is  given  by  T.  In  this  case,  the  two  segments  may  appeal-  in  either 
order  in  T. 

If  the  input  contains  n  segments,  we  can  perform  each  of  the  operations  INSERT, 
Delete,  Above,  and  Below  in  O(lgn)  time  using  red-black  trees.  Recall  that 
the  red-black-tree  operations  in  Chapter  13  involve  comparing  keys.  We  can  re¬ 
place  the  key  comparisons  by  comparisons  that  use  cross  products  to  determine  the 
relative  ordering  of  two  segments  (see  Exercise  33.2-2). 

Segment-intersection  pseudocode 

The  following  algorithm  takes  as  input  a  set  S  of  n  line  segments,  returning  the 
boolean  value  TRUE  if  any  pair  of  segments  in  S  intersects,  and  FALSE  otherwise. 
A  red-black  tree  maintains  the  total  preorder  T. 
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Any-Segments-Intersect  ( S ) 

1  T  =  0 

2  sort  the  endpoints  of  the  segments  in  S  from  left  to  right, 

breaking  ties  by  putting  left  endpoints  before  right  endpoints 
and  breaking  further  ties  by  putting  points  with  lower 
y  -coordinates  first 

3  for  each  point  p  in  the  sorted  list  of  endpoints 

4  if  /?  is  the  left  endpoint  of  a  segment  s 

5  Insert  (T,s) 

6  if  (Above(77  .s  )  exists  and  intersects  s ) 

or  (Below (77  s)  exists  and  intersects  s) 

7  return  TRUE 

8  if  is  the  right  endpoint  of  a  segment  s 

9  if  both  Above(77  s)  and  Below(77s)  exist 

and  Above(77  s)  intersects  Below (77  .s) 

10  return  TRUE 

11  Delete  (77  s) 

12  return  FALSE 

Figure  33.5  illustrates  how  the  algorithm  works.  Line  1  initializes  the  total  preorder 
to  be  empty.  Line  2  determines  the  event-point  schedule  by  sorting  the  2 n  segment 
endpoints  from  left  to  right,  breaking  ties  as  described  above.  One  way  to  perform 
line  2  is  by  lexicographically  sorting  the  endpoints  on  (x,e,  v),  where  x  and  y  are 
the  usual  coordinates,  e  =  0  for  a  left  endpoint,  and  e  =  1  for  a  right  endpoint. 

Each  iteration  of  the  for  loop  of  lines  3-11  processes  one  event  point  p.  If  p  is 
the  left  endpoint  of  a  segment  s,  line  5  adds  s  to  the  total  preorder,  and  lines  6-7 
return  TRUE  if  s  intersects  either  of  the  segments  it  is  consecutive  with  in  the  total 
preorder  defined  by  the  sweep  line  passing  through  p.  (A  boundary  condition 
occurs  if  p  lies  on  another  segment  s'.  In  this  case,  we  require  only  that  s  and  s' 
be  placed  consecutively  into  77)  If  p  is  the  right  endpoint  of  a  segment  s,  then 
we  need  to  delete  s  from  the  total  preorder.  But  first,  lines  9-10  return  TRUE  if 
there  is  an  intersection  between  the  segments  surrounding  s  in  the  total  preorder 
defined  by  the  sweep  line  passing  through  p.  If  these  segments  do  not  intersect, 
line  11  deletes  segment  s  from  the  total  preorder.  If  the  segments  surrounding 
segment  s  intersect,  they  would  have  become  consecutive  after  deleting  s  had  the 
return  statement  in  line  10  not  prevented  line  1 1  from  executing.  The  correctness 
argument,  which  follows,  will  make  it  clear  why  it  suffices  to  check  the  segments 
suiTounding  s.  Finally,  if  we  never  find  any  intersections  after  having  processed 
all  2/7  event  points,  line  12  returns  FALSE. 
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Figure  33.5  The  execution  of  Any  Segments  Intersect.  Each  dashed  line  is  the  sweep  line  at 
an  event  point.  Except  for  the  rightmost  sweep  line,  the  ordering  of  segment  names  below  each  sweep 
line  corresponds  to  the  total  preorder  T  at  the  end  of  the  for  loop  processing  the  corresponding  event 
point.  The  rightmost  sweep  line  occurs  when  processing  the  right  endpoint  of  segment  c;  because 
segments  d  and  b  surround  c  and  intersect  each  other,  the  procedure  returns  TRUE. 


Correctness 

To  show  that  ANY-SEGMENTS-lNTERSECT  is  correct,  we  will  prove  that  the  call 
Any-Segments-Intersect(S)  returns  true  if  and  only  if  there  is  an  intersec¬ 
tion  among  the  segments  in  S. 

It  is  easy  to  see  that  ANY-SEGMENTS-lNTERSECT  returns  TRUE  (on  lines  7 
and  10)  only  if  it  finds  an  intersection  between  two  of  the  input  segments.  Hence, 
if  it  returns  true,  there  is  an  intersection. 

We  also  need  to  show  the  converse:  that  if  there  is  an  intersection,  then  Any- 
Segments -Intersect  returns  true.  Let  us  suppose  that  there  is  at  least  one 
intersection.  Let  p  be  the  leftmost  intersection  point,  breaking  ties  by  choosing  the 
point  with  the  lowest  y-coordinate,  and  let  a  and  b  be  the  segments  that  intersect 
at  p.  Since  no  intersections  occur  to  the  left  of  p,  the  order  given  by  T  is  correct  at 
all  points  to  the  left  of  p.  Because  no  three  segments  intersect  at  the  same  point,  a 
and  b  become  consecutive  in  the  total  preorder  at  some  sweep  line  z2  Moreover, 
Z  is  to  the  left  of  p  or  goes  through  p.  Some  segment  endpoint  q  on  sweep  line  z 


2If  we  allow  three  segments  to  intersect  at  the  same  point,  there  may  be  an  intervening  segment  c  that 
intersects  both  a  and  b  at  point  p.  That  is,  we  may  have  a  c  and  c  b  for  all  sweep  lines  w  to 
the  left  of  p  for  which  a  > w  b.  Exercise  33.2  8  asks  you  to  show  that  Any  Segments  Intersect 
is  correct  even  if  three  segments  do  intersect  at  the  same  point. 
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is  the  event  point  at  which  a  and  b  become  consecutive  in  the  total  preorder.  If  p 
is  on  sweep  line  z.,  then  q  =  p.  If  p  is  not  on  sweep  line  z,  then  q  is  to  the  left 
of  p.  In  either  case,  the  order  given  by  T  is  correct  just  before  encountering  q. 
(Here  is  where  we  use  the  lexicographic  order  in  which  the  algorithm  processes 
event  points.  Because  p  is  the  lowest  of  the  leftmost  intersection  points,  even  if  p 
is  on  sweep  line  z  and  some  other  intersection  point  p'  is  on  2,  event  point  q  =  p 
is  processed  before  the  other  intersection  p'  can  interfere  with  the  total  preorder  T . 
Moreover,  even  if  p  is  the  left  endpoint  of  one  segment,  say  a ,  and  the  right  end¬ 
point  of  the  other  segment,  say  b,  because  left  endpoint  events  occur  before  right 
endpoint  events,  segment  b  is  in  T  upon  first  encountering  segment  a .)  Either  event 
point  q  is  processed  by  Any-Segments-Intersect  or  it  is  not  processed. 

If  q  is  processed  by  Any-Segments-Intersect,  only  two  possible  actions 
may  occur: 

1.  Either  a  or  b  is  inserted  into  T ,  and  the  other  segment  is  above  or  below  it  in 
the  total  preorder.  Lines  4-7  detect  this  case. 

2.  Segments  a  and  b  are  already  in  T ,  and  a  segment  between  them  in  the  total 
preorder  is  deleted,  making  a  and  b  become  consecutive.  Lines  8-11  detect  this 
case. 

In  either  case,  we  find  the  intersection  p  and  Any-Segments-Intersect  returns 
TRUE. 

If  event  point  q  is  not  processed  by  Any-Segments-Intersect,  the  proce¬ 
dure  must  have  returned  before  processing  all  event  points.  This  situation  could 
have  occurred  only  if  Any-Segments-Intersect  had  already  found  an  inter¬ 
section  and  returned  TRUE. 

Thus,  if  there  is  an  intersection,  Any-Segments-Intersect  returns  true. 
As  we  have  already  seen,  if  Any-Segments-Intersect  returns  true,  there  is 
an  intersection.  Therefore,  Any-Segments-Intersect  always  returns  a  correct 
answer. 

Running  time 

If  set  S  contains  n  segments,  then  Any-Segments-Intersect  runs  in  time 
0{n\gn).  Line  1  takes  0(1)  time.  Line  2  takes  0{n\gn)  time,  using  merge 
sort  or  heapsort.  The  for  loop  of  lines  3-1 1  iterates  at  most  once  per  event  point, 
and  so  with  2 n  event  points,  the  loop  iterates  at  most  2n  times.  Each  iteration  takes 
0(lg  n)  time,  since  each  red-black-tree  operation  takes  0(lg  n)  time  and,  using  the 
method  of  Section  33.1,  each  intersection  test  takes  0(1)  time.  The  total  time  is 
thus  0(n  \gn). 
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Exercises 


33.2-1 

Show  that  a  set  of  n  line  segments  may  contain  Q(»2)  intersections. 


33.2-2 

Given  two  segments  a  and  b  that  are  comparable  at  x,  show  how  to  determine 
in  0(1)  time  which  of  a  b  or  b  a  holds.  Assume  that  neither  segment 
is  vertical.  {Hint:  If  a  and  b  do  not  intersect,  you  can  just  use  cross  products. 
If  a  and  b  intersect— which  you  can  of  course  determine  using  only  cross  prod¬ 
ucts— you  can  still  use  only  addition,  subtraction,  and  multiplication,  avoiding 
division.  Of  course,  in  the  application  of  the  >sx  relation  used  here,  if  a  and  b 
intersect,  we  can  just  stop  and  declare  that  we  have  found  an  intersection.) 


33.2-3 

Professor  Mason  suggests  that  we  modify  Any-Segments-Intersect  so  that 
instead  of  returning  upon  finding  an  intersection,  it  prints  the  segments  that  inter¬ 
sect  and  continues  on  to  the  next  iteration  of  the  for  loop.  The  professor  calls  the 
resulting  procedure  Print-Intersecting-Segments  and  claims  that  it  prints 
all  intersections,  from  left  to  right,  as  they  occur  in  the  set  of  line  segments.  Pro¬ 
fessor  Dixon  disagrees,  claiming  that  Professor  Mason’s  idea  is  incorrect.  Which 
professor  is  right?  Will  Print-Intersecting-Segments  always  find  the  left¬ 
most  intersection  first?  Will  it  always  find  all  the  intersections? 


33.2-4 

Give  an  0{n  lg  n)-time  algorithm  to  determine  whether  an  n -vertex  polygon  is 
simple. 


33.2-5 

Give  an  0{n  lg  n  )-timc  algorithm  to  determine  whether  two  simple  polygons  with 
a  total  of  n  vertices  intersect. 


33.2-6 

A  disk  consists  of  a  circle  plus  its  interior  and  is  represented  by  its  center  point  and 
radius.  Two  disks  intersect  if  they  have  any  point  in  common.  Give  an  0(n  lg  //)- 
time  algorithm  to  determine  whether  any  two  disks  in  a  set  of  n  intersect. 


33.2-7 

Given  a  set  of  n  line  segments  containing  a  total  of  k  intersections,  show  how  to 
output  all  k  intersections  in  0((n  +  k)  lg  n)  time. 
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332-8 

Argue  that  Any-Segments-Intersect  works  correctly  even  if  three  or  more 
segments  intersect  at  the  same  point. 

332-9 

Show  that  ANY-SEGMENTS-lNTERSECT  works  correctly  in  the  presence  of  verti¬ 
cal  segments  if  we  treat  the  bottom  endpoint  of  a  vertical  segment  as  if  it  were  a 
left  endpoint  and  the  top  endpoint  as  if  it  were  a  right  endpoint.  How  does  your 
answer  to  Exercise  33.2-2  change  if  we  allow  vertical  segments? 


33.3  Finding  the  convex  hull 

The  convex  hull  of  a  set  Q  of  points,  denoted  by  CH (Q),  is  the  smallest  convex 
polygon  P  for  which  each  point  in  Q  is  either  on  the  boundary  of  P  or  in  its 
interior.  (See  Exercise  33.1-5  for  a  precise  definition  of  a  convex  polygon.)  We 
implicitly  assume  that  all  points  in  the  set  Q  are  unique  and  that  Q  contains  at 
least  three  points  which  are  not  colinear.  Intuitively,  we  can  think  of  each  point 
in  Q  as  being  a  nail  sticking  out  from  a  board.  The  convex  hull  is  then  the  shape 
formed  by  a  tight  rubber  band  that  surrounds  all  the  nails.  Figure  33.6  shows  a  set 
of  points  and  its  convex  hull. 

In  this  section,  we  shall  present  two  algorithms  that  compute  the  convex  hull 
of  a  set  of  n  points.  Both  algorithms  output  the  vertices  of  the  convex  hull  in 
counterclockwise  order.  The  first,  known  as  Graham’s  scan,  runs  in  0(n  lg  n  )  time. 
The  second,  called  Jarvis’s  march,  runs  in  0(nh)  time,  where  h  is  the  number  of 
vertices  of  the  convex  hull.  As  Figure  33.6  illustrates,  every  vertex  of  CH(Q)  is  a 


Figure  33.6  A  set  of  points  Q  =  {po.  pi , . . . ,  pil\  with  its  convex  hull  CH(  (3)  in  gray. 


1030 


Chapter  33  Computational  Geometry 


point  in  Q .  Both  algorithms  exploit  this  property,  deciding  which  vertices  in  Q  to 
keep  as  vertices  of  the  convex  hull  and  which  vertices  in  Q  to  reject. 

We  can  compute  convex  hulls  in  0(n  lg  n)  time  by  any  one  of  several  methods. 
Both  Graham’s  scan  and  Jarvis’s  march  use  a  technique  called  “rotational  sweep,” 
processing  vertices  in  the  order  of  the  polar  angles  they  form  with  a  reference 
vertex.  Other  methods  include  the  following: 

•  In  the  incremental  method,  we  first  sort  the  points  from  left  to  right,  yielding  a 
sequence  (pi,  p2,  . . . ,  p„)-  At  the  zth  stage,  we  update  the  convex  hull  of  the 
i  —  1  leftmost  points,  CH({/q ,  p2, ....  Pi~\}),  according  to  the  z'th  point  from 
the  left,  thus  forming  CH({ p{ ,  p2, . . . ,  />,}).  Exercise  33.3-6  asks  you  how  to 
implement  this  method  to  take  a  total  of  0(n  lg  n)  time. 

•  In  the  divide-and-conquer  method,  we  divide  the  set  of  n  points  in  &(n)  time 
into  two  subsets,  one  containing  the  leftmost  ["« / 2]  points  and  one  containing 
the  rightmost  [n  / 2J  points,  recursively  compute  the  convex  hulls  of  the  subsets, 
and  then,  by  means  of  a  clever  method,  combine  the  hulls  in  O(n)  time.  The 
running  time  is  described  by  the  familial-  recurrence  T (n )  =  2T(n/2)  +  0(n), 
and  so  the  divide-and-conquer  method  runs  in  0(n  lg  n)  time. 

•  The  prune-and-search  method  is  similar  to  the  worst-case  linear-time  median 
algorithm  of  Section  9.3.  With  this  method,  we  find  the  upper  portion  (or  “upper 
chain”)  of  the  convex  hull  by  repeatedly  throwing  out  a  constant  fraction  of  the 
remaining  points  until  only  the  upper  chain  of  the  convex  hull  remains.  We  then 
do  the  same  for  the  lower  chain.  This  method  is  asymptotically  the  fastest:  if 
the  convex  hull  contains  h  vertices,  it  runs  in  only  ()(n  lg  h )  time. 

Computing  the  convex  hull  of  a  set  of  points  is  an  interesting  problem  in  its  own 
right.  Moreover,  algorithms  for  some  other  computational-geometry  problems  start 
by  computing  a  convex  hull.  Consider,  for  example,  the  two-dimensional  farthest- 
pair  problem-,  we  are  given  a  set  of  n  points  in  the  plane  and  wish  to  find  the 
two  points  whose  distance  from  each  other  is  maximum.  As  Exercise  33.3-3  asks 
you  to  prove,  these  two  points  must  be  vertices  of  the  convex  hull.  Although  we 
won’t  prove  it  here,  we  can  find  the  farthest  pair  of  vertices  of  an  //-vertex  convex 
polygon  in  O(n)  time.  Thus,  by  computing  the  convex  hull  of  the  n  input  points 
in  0(n  lg  n)  time  and  then  finding  the  farthest  pair  of  the  resulting  convex-polygon 
vertices,  we  can  find  the  farthest  pair  of  points  in  any  set  of  //  points  in  0(n  lg  n) 
time. 

Graham’s  scan 

Graham’s  scan  solves  the  convex-hull  problem  by  maintaining  a  stack  S  of  can¬ 
didate  points.  It  pushes  each  point  of  the  input  set  Q  onto  the  stack  one  time, 
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and  it  eventually  pops  from  the  stack  each  point  that  is  not  a  vertex  of  CH(g). 
When  the  algorithm  terminates,  stack  S  contains  exactly  the  vertices  of  CH((2),  in 
counterclockwise  order  of  their  appearance  on  the  boundary. 

The  procedure  Graham-Scan  takes  as  input  a  set  Q  of  points,  where  \  Q\  >3. 
It  calls  the  functions  TOP  (S),  which  returns  the  point  on  top  of  stack  S  without 
changing  S,  and  Next-To-Top(S),  which  returns  the  point  one  entry  below  the 
top  of  stack  S  without  changing  S.  As  we  shall  prove  in  a  moment,  the  stack  S 
returned  by  Graham-Scan  contains,  from  bottom  to  top,  exactly  the  vertices 
of  CH(  Q )  in  counterclockwise  order. 

Graham-Scan(  Q ) 

1  let  p0  be  the  point  in  Q  with  the  minimum  y-coordinate, 

or  the  leftmost  such  point  in  case  of  a  tie 

2  let  {pi ,  p2, . . . ,  pm)  be  the  remaining  points  in  Q, 

sorted  by  polar  angle  in  counterclockwise  order  around  p0 
(if  more  than  one  point  has  the  same  angle,  remove  all  but 
the  one  that  is  farthest  from  p0 ) 

3  let  S  be  an  empty  stack 

4  PUSH(/>0,S) 

5  PUSH(/>i,S) 

6  Push(/>2-  S) 

7  for  i  =  3  to  m 

8  while  the  angle  formed  by  points  Next-To-Top(S),  Top(S), 

and  p,  makes  a  nonleft  turn 

9  POP(S) 

10  Push(/>,  ,  S) 

1 1  return  S 

Figure  33.7  illustrates  the  progress  of  Graham-Scan.  Line  1  chooses  point  p0 
as  the  point  with  the  lowest  y-coordinate,  picking  the  leftmost  such  point  in  case 
of  a  tie.  Since  there  is  no  point  in  Q  that  is  below  p0  and  any  other  points  with 
the  same  y-coordinate  are  to  its  right,  p0  must  be  a  vertex  of  CH(g).  Line  2 
sorts  the  remaining  points  of  Q  by  polar  angle  relative  to  p0,  using  the  same 
method— comparing  cross  products— as  in  Exercise  33.1-3.  If  two  or  more  points 
have  the  same  polar  angle  relative  to  p0,  all  but  the  farthest  such  point  are  convex 
combinations  of  p0  and  the  farthest  point,  and  so  we  remove  them  entirely  from 
consideration.  We  let  m  denote  the  number  of  points  other  than  p0  that  remain. 
The  polar  angle,  measured  in  radians,  of  each  point  in  Q  relative  to  p0  is  in  the 
half-open  interval  [0,  n).  Since  the  points  are  sorted  according  to  polar  angles, 
they  are  sorted  in  counterclockwise  order  relative  to  p0 .  We  designate  this  sorted 
sequence  of  points  by  (pi.  p2,  ....  pm).  Note  that  points  /;,  and  pm  are  vertices 
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Figure  33.7  The  execution  of  GRAHAM  SCAN  on  the  set  Q  of  Figure  33.6.  The  current  convex 
hull  contained  in  stack  S  is  shown  in  gray  at  each  step,  (a)  The  sequence  (p\ ,  p2, ....  p  12)  of  points 
numbered  in  order  of  increasing  polar  angle  relative  to  po ,  and  the  initial  stack  S  containing  po.  Pi, 
and  p2-  (b)  (k)  Stack  S  after  each  iteration  of  the  for  loop  of  lines  7  10.  Dashed  lines  show  nonleft 
turns,  which  cause  points  to  be  popped  from  the  stack.  In  part  (h),  for  example,  the  right  turn  at 
angle  Z py  p%  pg  causes  p%  to  be  popped,  and  then  the  right  turn  at  angle  Z pe,  py pg  causes  py  to  be 
popped. 


Figure  33.7,  continued  (1)  The  convex  hull  returned  by  the  procedure,  which  matches  that  of 
Figure  33.6. 
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of  CH(Q)  (see  Exercise  33.3-1).  Figure  33.7(a)  shows  the  points  of  Figure  33.6 
sequentially  numbered  in  order  of  increasing  polar  angle  relative  to  p0- 

The  remainder  of  the  procedure  uses  the  stack  S.  Fines  3-6  initialize  the  stack 
to  contain,  from  bottom  to  top,  the  first  three  points  p0,  p\,  and  p2.  Figure  33.7(a) 
shows  the  initial  stack  S.  The  for  loop  of  lines  7-10  iterates  once  for  each  point 
in  the  subsequence  (p2,  p4,  . . . ,  pm).  We  shall  see  that  after  processing  point  pit 
stack  S  contains,  from  bottom  to  top,  the  vertices  of  CH({ p0,  px, . . . ,  /?,•})  in  coun¬ 
terclockwise  order.  The  while  loop  of  lines  8-9  removes  points  from  the  stack  if 
we  find  them  not  to  be  vertices  of  the  convex  hull.  When  we  traverse  the  convex 
hull  counterclockwise,  we  should  make  a  left  turn  at  each  vertex.  Thus,  each  time 
the  while  loop  finds  a  vertex  at  which  we  make  a  nonleft  turn,  we  pop  the  vertex 
from  the  stack.  (By  checking  for  a  nonleft  turn,  rather  than  just  a  right  turn,  this 
test  precludes  the  possibility  of  a  straight  angle  at  a  vertex  of  the  resulting  convex 
hull.  We  want  no  straight  angles,  since  no  vertex  of  a  convex  polygon  may  be  a 
convex  combination  of  other  vertices  of  the  polygon.)  After  we  pop  all  vertices 
that  have  nonleft  turns  when  heading  toward  point  p, ,  we  push  pt  onto  the  stack. 
Figures  33.7(b)-(k)  show  the  state  of  the  stack  S  after  each  iteration  of  the  for 
loop.  Finally,  Graham-Scan  returns  the  stack  S  in  line  11.  Figure  33.7(1)  shows 
the  corresponding  convex  hull. 

The  following  theorem  formally  proves  the  correctness  of  Graham-Scan. 
Theorem  33.1  (Correctness  of  Graham’s  scan) 

If  Graham-Scan  executes  on  a  set  Q  of  points,  where  \  Q\  >3,  then  at  termina¬ 
tion,  the  stack  S  consists  of,  from  bottom  to  top,  exactly  the  vertices  of  C  H  ( (7  )  in 
counterclockwise  order. 

Proof  After  line  2,  we  have  the  sequence  of  points  (pi,  p2,  . . . ,  pm).  Fet  us 
define,  for  i  =  2,  3, ...  ,m,  the  subset  of  points  Qt  =  {p0.  p\, . . . ,  />,}.  The 
points  in  Q  —  Qm  are  those  that  were  removed  because  they  had  the  same  polar 
angle  relative  to  p0  as  some  point  in  Qm\  these  points  are  not  in  CH(Q),  and 
so  CH (Qm)  =  CH(g).  Thus,  it  suffices  to  show  that  when  Graham-Scan 
terminates,  the  stack  S  consists  of  the  vertices  of  CH (Qm)  in  counterclockwise 
order,  when  listed  from  bottom  to  top.  Note  that  just  as  p0,  /q ,  and  pm  are  vertices 
of  CH(g),  the  points  p0,  p\,  and  pt  are  all  vertices  of  CH(g,). 

The  proof  uses  the  following  loop  invariant: 

At  the  start  of  each  iteration  of  the  for  loop  of  lines  7-10,  stack  S  consists  of, 
from  bottom  to  top,  exactly  the  vertices  of  CH((9;_i)  in  counterclockwise 
order. 

Initialization:  The  invariant  holds  the  first  time  we  execute  line  7,  since  at  that 
time,  stack  S  consists  of  exactly  the  vertices  of  Q2  =  and  this  set  of  three 
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(a)  (b) 


Figure  33.8  The  proof  of  correctness  of  Graham  SCAN,  (a)  Because  p/'s  polar  angle  relative 
to  po  is  greater  than  pj ’s  polar  angle,  and  because  the  angle  Z pg  pj  pi  makes  a  left  turn,  adding  pi 
to  CH (Qj  )  gives  exactly  the  vertices  of  CH (Qj  U  {p, }).  (b)  If  the  angle  Z pr  pi  pi  makes  a  nonleft 
turn,  then  pt  is  either  in  the  interior  of  the  triangle  formed  by  po,  pr<  and  />,■  or  on  a  side  of  the 
triangle,  which  means  that  it  cannot  be  a  vertex  of  CH  (()/)• 

vertices  forms  its  own  convex  hull.  Moreover,  they  appear  in  counterclockwise 
order  from  bottom  to  top. 

Maintenance:  Entering  an  iteration  of  the  for  loop,  the  top  point  on  stack  S 
is  pi- 1,  which  was  pushed  at  the  end  of  the  previous  iteration  (or  before  the 
first  iteration,  when  i  =  3).  Let  p7  be  the  top  point  on  S  after  executing  the 
while  loop  of  lines  8-9  but  before  line  10  pushes  p,-,  and  let  pg  be  the  point 
just  below  pj  on  S.  At  the  moment  that  p7  is  the  top  point  on  S  and  we  have 
not  yet  pushed  p,-,  stack  S  contains  exactly  the  same  points  it  contained  after 
iteration  j  of  the  for  loop.  By  the  loop  invariant,  therefore,  S  contains  exactly 
the  vertices  of  CH  (Qj)  at  that  moment,  and  they  appear  in  counterclockwise 
order  from  bottom  to  top. 

Let  us  continue  to  focus  on  this  moment  just  before  pushing  p, .  We  know 
that  pi's  polar  angle  relative  to  po  is  greater  than  p/s  polar  angle  and  that 
the  angle  /.pg  Pj  Pi  makes  a  left  turn  (otherwise  we  would  have  popped  pj). 
Therefore,  because  S’  contains  exactly  the  vertices  of  CH(g7),  we  see  from 
Figure  33.8(a)  that  once  we  push  p,,  stack  S  will  contain  exactly  the  vertices 
of  CH  (Qj  U  {p,}),  still  in  counterclockwise  order  from  bottom  to  top. 

We  now  show  that  CH(g7  U{p,})  is  the  same  set  of  points  as  CH (Qi).  Consider 
any  point  pt  that  was  popped  during  iteration  i  of  the  for  loop,  and  let  pr  be 
the  point  just  below  p,  on  stack  S  at  the  time  p,  was  popped  (pr  might  be  p7 ). 
The  angle  Zprpfp,  makes  a  nonleft  turn,  and  the  polar  angle  of  p,  relative 
to  po  is  greater  than  the  polar  angle  of  pr.  As  Figure  33.8(b)  shows,  p,  must 
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be  either  in  the  interior  of  the  triangle  formed  by  p0,  pr,  and  p,  or  on  a  side  of 
this  triangle  (but  it  is  not  a  vertex  of  the  triangle).  Clearly,  since  p,  is  within  a 
triangle  formed  by  three  other  points  of  Qj,  it  cannot  be  a  vertex  of  CH((9,). 
Since  p,  is  not  a  vertex  of  CH(g;  ),  we  have  that 

CH (Q,  -  {pt})  =  CH (Qj)  .  (33.1) 

Let  Pj  be  the  set  of  points  that  were  popped  during  iteration  i  of  the  for  loop. 
Since  the  equality  (33.1)  applies  for  all  points  in  Pj,  we  can  apply  it  repeatedly 
to  show  that  CH (Qj  —  Pj)  =  CH((2,).  But  Qj  —  Pj  =  Q j  U  {/?,},  and  so  we 
conclude  that  CH (Qj  U  {/>,})  =  CH(g,  -  P.)  =  CH (Qj). 

We  have  shown  that  once  we  push  pj,  stack  S  contains  exactly  the  vertices 
of  CH(g;  )  in  counterclockwise  order  from  bottom  to  top.  Incrementing  i  will 
then  cause  the  loop  invariant  to  hold  for  the  next  iteration. 

Termination:  When  the  loop  terminates,  we  have  i  =  m  +  1,  and  so  the  loop 
invariant  implies  that  stack  S  consists  of  exactly  the  vertices  of  CH (Qm),  which 
is  CH(g),  in  counterclockwise  order  from  bottom  to  top.  This  completes  the 
proof.  ■ 

We  now  show  that  the  running  time  of  Graham-Scan  is  0(n  Ig  n),  where 
n  =  \Q\.  Line  1  takes  0(/i)  time.  Line  2  takes  0(n  Ig n)  time,  using  merge  sort 
or  heapsort  to  sort  the  polar  angles  and  the  cross-product  method  of  Section  33. 1 
to  compare  angles.  (We  can  remove  all  but  the  farthest  point  with  the  same  polar 
angle  in  total  of  0(n)  time  over  all  n  points.)  Lines  3-6  take  0(1)  time.  Because 
m  <  n  —  1,  the  for  loop  of  lines  7-10  executes  at  most  n  —  3  times.  Since  PUSH 
takes  0(1)  time,  each  iteration  takes  0(1)  time  exclusive  of  the  time  spent  in  the 
while  loop  of  lines  8-9,  and  thus  overall  the  for  loop  takes  0(n )  time  exclusive  of 
the  nested  while  loop. 

We  use  aggregate  analysis  to  show  that  the  while  loop  takes  0(n)  time  overall. 

For  i  =  0,  1 . m,  we  push  each  point  p,  onto  stack  S  exactly  once.  As  in  the 

analysis  of  the  Multipop  procedure  of  Section  17. 1 ,  we  observe  that  we  can  pop  at 
most  the  number  of  items  that  we  push.  At  least  three  points— po,  Pi,  and  pm— are 
never  popped  from  the  stack,  so  that  in  fact  at  most  m  —  2  POP  operations  are 
performed  in  total.  Each  iteration  of  the  while  loop  performs  one  POP,  and  so 
there  are  at  most  m  —  2  iterations  of  the  while  loop  altogether.  Since  the  test  in 
line  8  takes  0(1)  time,  each  call  of  POP  takes  0(1)  time,  and  m  <  n  —  1,  the  total 
time  taken  by  the  while  loop  is  0(n).  Thus,  the  running  time  of  Graham-Scan 
is  0(n  lg  n). 
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Figure  33.9  The  operation  of  Jarvis’s  march.  We  choose  the  first  vertex  as  the  lowest  point  po- 
The  next  vertex,  p  j,  has  the  smallest  polar  angle  of  any  point  with  respect  to  po.  Then,  P2  has  the 
smallest  polar  angle  with  respect  to  pi .  The  right  chain  goes  as  high  as  the  highest  point  p3.  Then, 
we  construct  the  left  chain  by  finding  smallest  polar  angles  with  respect  to  the  negative  x  axis. 

Jarvis’s  march 

Jarvis’s  march  computes  the  convex  hull  of  a  set  Q  of  points  by  a  technique  known 
as  package  wrapping  (or  gift  wrapping).  The  algorithm  runs  in  time  O(nh), 
where  h  is  the  number  of  vertices  of  CH(<2).  When  h  is  o(lgn),  Jarvis’s  march  is 
asymptotically  faster  than  Graham’s  scan. 

Intuitively,  Jarvis’s  march  simulates  wrapping  a  taut  piece  of  paper  around  the 
set  Q.  We  start  by  taping  the  end  of  the  paper  to  the  lowest  point  in  the  set,  that  is, 
to  the  same  point  p0  with  which  we  start  Graham’s  scan.  We  know  that  this  point 
must  be  a  vertex  of  the  convex  hull.  We  pull  the  paper  to  the  right  to  make  it  taut, 
and  then  we  pull  it  higher  until  it  touches  a  point.  This  point  must  also  be  a  vertex 
of  the  convex  hull.  Keeping  the  paper  taut,  we  continue  in  this  way  around  the  set 
of  vertices  until  we  come  back  to  our  original  point  po. 

More  formally,  Jarvis’s  march  builds  a  sequence  H  =  ( p0 ,  p  i , . . . ,  Ph-\)  of  the 
vertices  of  CH (Q).  We  start  with  po-  As  Figure  33.9  shows,  the  next  vertex  p\ 
in  the  convex  hull  has  the  smallest  polar  angle  with  respect  to  po.  (In  case  of  ties, 
we  choose  the  point  farthest  from  po.)  Similarly,  p2  has  the  smallest  polar  angle 
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with  respect  to  p\,  and  so  on.  When  we  reach  the  highest  vertex,  say  py  (breaking 
ties  by  choosing  the  farthest  such  vertex),  we  have  constructed,  as  Figure  33.9 
shows,  the  right  chain  of  CH((J).  To  construct  the  left  chain ,  we  start  at  py  and 
choose  pk+ 1  as  the  point  with  the  smallest  polar  angle  with  respect  to  py,  but  from 
the  negative  x-axis.  We  continue  on,  forming  the  left  chain  by  taking  polar  angles 
from  the  negative  x-axis,  until  we  come  back  to  our  original  vertex  p0. 

We  could  implement  Jarvis’s  march  in  one  conceptual  sweep  around  the  convex 
hull,  that  is,  without  separately  constructing  the  right  and  left  chains.  Such  imple¬ 
mentations  typically  keep  track  of  the  angle  of  the  last  convex-hull  side  chosen  and 
require  the  sequence  of  angles  of  hull  sides  to  be  strictly  increasing  (in  the  range 
of  0  to  2n  radians).  The  advantage  of  constructing  separate  chains  is  that  we  need 
not  explicitly  compute  angles;  the  techniques  of  Section  33.1  suffice  to  compare 
angles. 

If  implemented  properly,  Jarvis’s  march  has  a  running  time  of  0(nh).  For  each 
of  the  h  vertices  of  CH (Q),  we  find  the  vertex  with  the  minimum  polar  angle.  Each 
comparison  between  polar  angles  takes  0(1)  time,  using  the  techniques  of  Sec¬ 
tion  33.1.  As  Section  9.1  shows,  we  can  compute  the  minimum  of  n  values  in  0(n ) 
time  if  each  comparison  takes  0(1)  time.  Thus,  Jarvis’s  march  takes  0(nh )  time. 

Exercises 


33.3-1 

Prove  that  in  the  procedure  Graham-Scan,  points  p\  and  pm  must  be  vertices 
ofCH(g). 


33.3-2 

Consider  a  model  of  computation  that  supports  addition,  comparison,  and  multipli¬ 
cation  and  for  which  there  is  a  lower  bound  of  Q(n  lg  n)  to  sort  n  numbers.  Prove 
that  E!(»  lg/;)  is  a  lower  bound  for  computing,  in  order,  the  vertices  of  the  convex 
hull  of  a  set  of  n  points  in  such  a  model. 


33.3-3 

Given  a  set  of  points  Q ,  prove  that  the  pair  of  points  farthest  from  each  other  must 
be  vertices  of  CH(g). 


33.3-4 

For  a  given  polygon  P  and  a  point  q  on  its  boundary,  the  shadow  of  q  is  the  set 
of  points  r  such  that  the  segment  c[f  is  entirely  on  the  boundary  or  in  the  interior 
of  P.  As  Figure  33.10  illustrates,  a  polygon  P  is  star-shaped  if  there  exists  a 
point  p  in  the  interior  of  P  that  is  in  the  shadow  of  every  point  on  the  boundary 
of  P .  The  set  of  all  such  points  p  is  called  the  kernel  of  P .  Given  an  n -vertex, 
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Figure  33.10  The  definition  of  a  star  shaped  polygon,  for  use  in  Exercise  33.3  4.  (a)  A  star  shaped 
polygon.  The  segment  from  point  p  to  any  point  q  on  the  boundary  intersects  the  boundary  only  at  q. 
(b)  A  non  star  shaped  polygon.  The  shaded  region  on  the  left  is  the  shadow  of  q,  and  the  shaded 
region  on  the  right  is  the  shadow  of  q'.  Since  these  regions  are  disjoint,  the  kernel  is  empty. 


star-shaped  polygon  P  specified  by  its  vertices  in  counterclockwise  order,  show 
how  to  compute  CH(P)  in  O(n)  time. 


33.3- 5 

In  the  on-line  convex-hull  problem,  we  are  given  the  set  Q  of  n  points  one  point  at 
a  time.  After  receiving  each  point,  we  compute  the  convex  hull  of  the  points  seen 
so  far.  Obviously,  we  could  run  Graham’s  scan  once  for  each  point,  with  a  total 
running  time  of  0(n2  lg  n).  Show  how  to  solve  the  on-line  convex -hull  problem  in 
a  total  of  0(n2)  time. 

33.3- 6  * 

Show  how  to  implement  the  incremental  method  for  computing  the  convex  hull 
of  n  points  so  that  it  runs  in  0(n  lg  «)  time. 


33.4  Finding  the  closest  pair  of  points 

We  now  consider  the  problem  of  finding  the  closest  pair  of  points  in  a  set  Q  of 
n  >2  points.  “Closest”  refers  to  the  usual  euclidean  distance:  the  distance  between 
points  px  =  and  p2  =  ( x2 ,  y2)  is  y/(*i  -*2)2  +  Oh  -  y2)2.  Two  points 

in  set  Q  may  be  coincident,  in  which  case  the  distance  between  them  is  zero.  This 
problem  has  applications  in,  for  example,  traffic-control  systems.  A  system  for 
controlling  air  or  sea  traffic  might  need  to  identify  the  two  closest  vehicles  in  order 
to  detect  potential  collisions. 

A  brute-force  closest-pair  algorithm  simply  looks  at  all  the  (”)  =  0(/j2)  pairs 
of  points.  In  this  section,  we  shall  describe  a  divide-and-conquer  algorithm  for 
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this  problem,  whose  running  time  is  described  by  the  familial-  recurrence  T  (n  )  — 
2T{n/2)  +  0(n).  Thus,  this  algorithm  uses  only  0(n  Ig  n)  time. 

The  divide-and-conquer  algorithm 

Each  recursive  invocation  of  the  algorithm  takes  as  input  a  subset  P  C  Q  and 
arrays  X  and  Y,  each  of  which  contains  all  the  points  of  the  input  subset  P. 
The  points  in  array  X  are  sorted  so  that  their  x  -coordinates  are  monotonically 
increasing.  Similarly,  array  Y  is  sorted  by  monotonically  increasing  ('-coordinate. 
Note  that  in  order  to  attain  the  0(n  lg  n)  time  bound,  we  cannot  afford  to  sort 
in  each  recursive  call;  if  we  did,  the  recurrence  for  the  running  time  would  be 
T(n)  =  2T(n/2)  +  0(nlgn),  whose  solution  is  T(n)  =  0(nlg2n).  (Use  the 
version  of  the  master  method  given  in  Exercise  4.6-2.)  We  shall  see  a  little  later 
how  to  use  “presorting”  to  maintain  this  sorted  property  without  actually  sorting  in 
each  recursive  call. 

A  given  recursive  invocation  with  inputs  P,  X,  and  Y  first  checks  whether 
|P|  <  3.  If  so,  the  invocation  simply  performs  the  brute-force  method  described 
above:  try  all  (  ^  )  pairs  of  points  and  return  the  closest  pair.  If  |P|  >  3,  the 
recursive  invocation  carries  out  the  divide-and-conquer  paradigm  as  follows. 

Divide:  Find  a  vertical  line  /  that  bisects  the  point  set  P  into  two  sets  PL  and  PR 
such  that  \PL\  =  [|  P |  /2] ,  |  PR\  =  [|P|  /2J,  all  points  in  PL  are  on  or  to  the 
left  of  line  / ,  and  all  points  in  PR  are  on  or  to  the  right  of  / .  Divide  the  array  X 
into  arrays  XL  and  XR,  which  contain  the  points  of  PR  and  PR  respectively, 
sorted  by  monotonically  increasing  x -coordinate.  Similarly,  divide  the  array  Y 
into  arrays  YL  and  YR,  which  contain  the  points  of  PR  and  PR  respectively, 
sorted  by  monotonically  increasing  y -coordinate. 

Conquer:  Having  divided  P  into  PR  and  PR,  make  two  recursive  calls,  one  to  find 
the  closest  pair  of  points  in  PL  and  the  other  to  find  the  closest  pair  of  points 
in  PR.  The  inputs  to  the  first  call  are  the  subset  PL  and  arrays  X L  and  YL\  the 
second  call  receives  the  inputs  PR,  XR,  and  YR.  Let  the  closest-pair  distances 
returned  for  PL  and  PR  be  8L  and  SR,  respectively,  and  let  8  =  min  (5^,  SR). 

Combine:  The  closest  pair  is  either  the  pair  with  distance  8  found  by  one  of  the 
recursive  calls,  or  it  is  a  pair  of  points  with  one  point  in  PL  and  the  other  in  PR. 
The  algorithm  determines  whether  there  is  a  pair  with  one  point  in  PL  and  the 
other  point  in  PR  and  whose  distance  is  less  than  8.  Observe  that  if  a  pair  of 
points  has  distance  less  than  5,  both  points  of  the  pair  must  be  within  8  units 
of  line  /.  Thus,  as  Figure  33.11(a)  shows,  they  both  must  reside  in  the  25-wide 
vertical  strip  centered  at  line  I.  To  find  such  a  pair,  if  one  exists,  we  do  the 
following: 


33.4  Finding  the  closest  pair  of  points 


1041 


1.  Create  an  array  Y' ,  which  is  the  array  Y  with  all  points  not  in  the  25-wide 
vertical  strip  removed.  The  array  Y'  is  sorted  by  y -coordinate,  just  as  Y  is. 

2.  For  each  point  p  in  the  array  Y',  try  to  find  points  in  Y'  that  are  within  8 
units  of  p.  As  we  shall  see  shortly,  only  the  7  points  in  Y'  that  follow  p  need 
be  considered.  Compute  the  distance  from  p  to  each  of  these  7  points,  and 
keep  track  of  the  closest-pair  distance  8'  found  over  all  pairs  of  points  in  Y'. 

3.  If  S’  <  8,  then  the  vertical  strip  does  indeed  contain  a  closer  pair  than  the 
recursive  calls  found.  Return  this  pair  and  its  distance  S'.  Otherwise,  return 
the  closest  pair  and  its  distance  8  found  by  the  recursive  calls. 

The  above  description  omits  some  implementation  details  that  are  necessary  to 
achieve  the  0(n  lg  n)  running  time.  After  proving  the  correctness  of  the  algorithm, 
we  shall  show  how  to  implement  the  algorithm  to  achieve  the  desired  time  bound. 

Correctness 

The  correctness  of  this  closest-pair  algorithm  is  obvious,  except  for  two  aspects. 
First,  by  bottoming  out  the  recursion  when  |  P  \  <  3,  we  ensure  that  we  never  try  to 
solve  a  subproblem  consisting  of  only  one  point.  The  second  aspect  is  that  we  need 
only  check  the  7  points  following  each  point  p  in  array  7';  we  shall  now  prove  this 
property. 

Suppose  that  at  some  level  of  the  recursion,  the  closest  pair  of  points  is  pL  e  PL 
and  pR  €  PR.  Thus,  the  distance  S'  between  pL  and  pR  is  strictly  less  than  8. 
Point  pL  must  be  on  or  to  the  left  of  line  l  and  less  than  8  units  away.  Similarly,  p R 
is  on  or  to  the  right  of  /  and  less  than  8  units  away.  Moreover,  pL  and  pR  are 
within  8  units  of  each  other  vertically.  Thus,  as  Figure  33.1 1(a)  shows,  pL  and  pR 
are  within  a  8  x  28  rectangle  centered  at  line  / .  (There  may  be  other  points  within 
this  rectangle  as  well.) 

We  next  show  that  at  most  8  points  of  P  can  reside  within  this  8  x  28  rectangle. 
Consider  the  8  x  8  square  forming  the  left  half  of  this  rectangle.  Since  all  points 
within  PL  are  at  least  8  units  apart,  at  most  4  points  can  reside  within  this  square; 
Figure  33.11(b)  shows  how.  Similarly,  at  most  4  points  in  PR  can  reside  within 
the  8  x  8  square  forming  the  right  half  of  the  rectangle.  Thus,  at  most  8  points  of  P 
can  reside  within  the  8  x  28  rectangle.  (Note  that  since  points  on  line  /  may  be  in 
either  PL  or  PR ,  there  may  be  up  to  4  points  on  / .  This  limit  is  achieved  if  there  are 
two  pairs  of  coincident  points  such  that  each  pair  consists  of  one  point  from  PL  and 
one  point  from  PR,  one  pair  is  at  the  intersection  of  /  and  the  top  of  the  rectangle, 
and  the  other  pair  is  where  /  intersects  the  bottom  of  the  rectangle.) 

Having  shown  that  at  most  8  points  of  P  can  reside  within  the  rectangle,  we 
can  easily  see  why  we  need  to  check  only  the  7  points  following  each  point  in  the 
array  Y'.  Still  assuming  that  the  closest  pair  is  pL  and  pR,  let  us  assume  without 
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Figure  33. 1 1  Key  concepts  in  the  proof  that  the  closest  pair  algorithm  needs  to  check  only  7  points 
following  each  point  in  the  array  Y' .  (a)  If  pi  e  Pl  and  pr  e  Pr  are  less  than  8  units  apart,  they 
must  reside  within  a  8  x  28  rectangle  centered  at  line  /.  (b)  How  4  points  that  are  pairwise  at  least  8 
units  apart  can  all  reside  within  a  8  x  8  square.  On  the  left  are  4  points  in  Pr  ,  and  on  the  right  are  4 
points  in  Pr.  The  8  x  28  rectangle  can  contain  8  points  if  the  points  shown  on  line  /  are  actually 
pairs  of  coincident  points  with  one  point  in  Pr  and  one  in  Pr. 


loss  of  generality  that  pi  precedes  pr  in  array  Y' .  Then,  even  if  pi  occurs  as  early 
as  possible  in  Y'  and  pr  occurs  as  late  as  possible,  pr  is  in  one  of  the  7  positions 
following  pr.  Thus,  we  have  shown  the  correctness  of  the  closest-pair  algorithm. 

Implementation  and  running  time 

As  we  have  noted,  our  goal  is  to  have  the  recurrence  for  the  running  time  be  T(n)  = 
2T(n/2)  -f  O(n),  where  T (n)  is  the  running  time  for  a  set  of  n  points.  The  main 
difficulty  comes  from  ensuring  that  the  arrays  Xr,  X  r,  Yr,  and  Yr,  which  are 
passed  to  recursive  calls,  are  sorted  by  the  proper  coordinate  and  also  that  the 
array  Y'  is  sorted  by  y -coordinate.  (Note  that  if  the  array  X  that  is  received  by  a 
recursive  call  is  already  sorted,  then  we  can  easily  divide  set  P  into  Pr  and  Pr  in 
linear  time.) 

The  key  observation  is  that  in  each  call,  we  wish  to  form  a  sorted  subset  of  a 
sorted  array.  For  example,  a  particular  invocation  receives  the  subset  P  and  the 
array  Y,  sorted  by  y -coordinate.  Having  partitioned  P  into  Pr  and  Pr,  it  needs  to 
form  the  arrays  Yl  and  Yr,  which  are  sorted  by  y-coordinate,  in  linear  time.  We 
can  view  the  method  as  the  opposite  of  the  Merge  procedure  from  merge  sort  in 
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Section  2.3.1:  we  are  splitting  a  sorted  array  into  two  sorted  arrays.  The  following 
pseudocode  gives  the  idea. 

1  let  Y[\  \  . .  Y.  length]  and  Yu  [  1 . .  Y.  length]  be  new  arrays 

2  Yp.  length  =  Yr.  length  =  0 

3  for  i  =  1  to  Y.  length 

4  if  Y  [i  ]  e  PL 

5  Yp.  length  =  Yp.  length  +  1 

6  Yp[Yp.length]  =  Y[i ] 

7  else  Yr  .  length  =  Y R .  length  +  1 

8  Yr[Yr.  length]  =  Y[i] 

We  simply  examine  the  points  in  array  Y  in  order.  If  a  point  F[z]  is  in  PL,  we 
append  it  to  the  end  of  array  Yp\  otherwise,  we  append  it  to  the  end  of  array  Yr. 
Similar  pseudocode  works  for  forming  arrays  XL,  XR,  and  Y' . 

The  only  remaining  question  is  how  to  get  the  points  sorted  in  the  first  place.  We 
presort  them;  that  is,  we  sort  them  once  and  for  all  before  the  first  recursive  call. 
We  pass  these  sorted  arrays  into  the  first  recursive  call,  and  from  there  we  whittle 
them  down  through  the  recursive  calls  as  necessary.  Presorting  adds  an  additional 
0(n  Ig  /;)  term  to  the  running  time,  but  now  each  step  of  the  recursion  takes  linear 
time  exclusive  of  the  recursive  calls.  Thus,  if  we  let  T(n)  be  the  running  time  of 
each  recursive  step  and  T'{n )  be  the  running  time  of  the  entire  algorithm,  we  get 
T'(n)  =  T{n)  +  0(n  lg  n)  and 

(2  T(n/2)  +  0(n)  if  n  >  3  . 

”  (0(1)  if  «  <  3  . 

Thus,  T(n)  —  O(nlgn)  and  T'{n)  =  0{n\gn). 

Exercises 


33.4-1 

Professor  Williams  comes  up  with  a  scheme  that  allows  the  closest-pair  algorithm 
to  check  only  5  points  following  each  point  in  array  Y' .  The  idea  is  always  to  place 
points  on  line  /  into  set  PL.  Then,  there  cannot  be  pairs  of  coincident  points  on 
line  /  with  one  point  in  PL  and  one  in  PR.  Thus,  at  most  6  points  can  reside  in 
the  8  x  28  rectangle.  What  is  the  flaw  in  the  professor’s  scheme? 


33.4-2 

Show  that  it  actually  suffices  to  check  only  the  points  in  the  5  array  positions  fol¬ 
lowing  each  point  in  the  array  Y' . 
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33.4-3 

We  can  define  the  distance  between  two  points  in  ways  other  than  euclidean.  In 
the  plane,  the  L  m-distance  between  points  p\  and  p2  is  given  by  the  expres¬ 
sion  ( \xi  —  x2\m  +  |  Vi  —  y2\m)llm •  Euclidean  distance,  therefore,  is  L2-distance. 
Modify  the  closest-pair  algorithm  to  use  the  L  x  -distance,  which  is  also  known  as 
the  Manhattan  distance. 


33.4-4 

Given  two  points  p\  and  p2  in  the  plane,  the  Lx -distance  between  them  is 
given  by  max(|xi  —  x2\ ,  | y i  —  y2\).  Modify  the  closest-pair  algorithm  to  use  the 
Loo -distance. 


33.4-5 

Suppose  that  Q(n)  of  the  points  given  to  the  closest-pair  algorithm  are  covertical. 
Show  how  to  determine  the  sets  Pp  and  PR  and  how  to  determine  whether  each 
point  of  Y  is  in  PL  or  PR  so  that  the  running  time  for  the  closest-pair  algorithm 
remains  0{n\gn). 


33.4-6 

Suggest  a  change  to  the  closest-pair  algorithm  that  avoids  presorting  the  Y  array 
but  leaves  the  running  time  as  0(n  Ig n).  {Hint:  Merge  sorted  arrays  YL  and  YR  to 
form  the  sorted  array  Y .) 


Problems 


33-1  Convex  layers 

Given  a  set  Q  of  points  in  the  plane,  we  define  the  convex  layers  of  Q  inductively. 
The  first  convex  layer  of  Q  consists  of  those  points  in  Q  that  are  vertices  of  CH(g). 
For  i  >  1,  define  Qj  to  consist  of  the  points  of  Q  with  all  points  in  convex  layers 
1,2,...,/  —  1  removed.  Then,  the  / th  convex  layer  of  Q  is  CH (Qj)  if  Q,  ^  0  and 
is  undefined  otherwise. 

a.  Give  an  0(«2)-time  algorithm  to  find  the  convex  layers  of  a  set  of  n  points. 

b.  Prove  that  Q  (n  lg  n )  time  is  required  to  compute  the  convex  layers  of  a  set  of  n 
points  with  any  model  of  computation  that  requires  £2(«  lg  /?)  time  to  sort  n  real 
numbers. 


Problems  for  Chapter  33 
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33-2  Maximal  layers 

Let  Q  be  a  set  of  n  points  in  the  plane.  We  say  that  point  (x,  y)  dominates 
point  (x',  y')  if  x  >  x'  and  y  >  y' .  A  point  in  Q  that  is  dominated  by  no  other 
points  in  Q  is  said  to  be  maximal.  Note  that  Q  may  contain  many  maximal  points, 
which  can  be  organized  into  maximal  layers  as  follows.  The  first  maximal  layer  Lx 
is  the  set  of  maximal  points  of  Q.  For  i  >  1,  the  i  th  maximal  layer  L,  is  the  set  of 
maximal  points  in  Q  —  (J/^i  Lj. 

Suppose  that  Q  has  k  nonempty  maximal  layers,  and  let  y,  be  the  _y -coordinate 
of  the  leftmost  point  in  L,  for  /  =  1,2 , ,k.  For  now,  assume  that  no  two  points 
in  Q  have  the  same  x-  or  y -coordinate. 

a.  Show  that  y i  >  yi  >  •  •  •  >  yk- 

Consider  a  point  (x,  y)  that  is  to  the  left  of  any  point  in  Q  and  for  which  y  is 
distinct  from  the  y -coordinate  of  any  point  in  Q.  Let  Q'  =  Q  U  {(x,  y)}. 

b.  Let  j  be  the  minimum  index  such  that  yj  <  y,  unless  y  <  yk,  in  which  case 
we  let  j  =  k  +  1.  Show  that  the  maximal  layers  of  Q'  are  as  follows: 

*  If  j  <  k,  then  the  maximal  layers  of  Q'  arc  the  same  as  the  maximal  layers 
of  Q,  except  that  Lj  also  includes  (x,  y)  as  its  new  leftmost  point. 

•  If  j  =  k  +  1,  then  the  first  k  maximal  layers  of  Q'  are  the  same  as  for  Q,  but 
in  addition,  Q'  has  a  nonempty  (k  +  l)st  maximal  layer:  Lk+\  —  {(x,  y)}. 

c.  Describe  an  0{n  lg  n)-time  algorithm  to  compute  the  maximal  layers  of  a  set  Q 
of  n  points.  {Hint:  Move  a  sweep  line  from  right  to  left.) 

d.  Do  any  difficulties  arise  if  we  now  allow  input  points  to  have  the  same  x-  or 
y -coord  in  ate?  Suggest  a  way  to  resolve  such  problems. 

33-3  Ghostbusters  and  ghosts 

A  group  of  n  Ghostbusters  is  battling  n  ghosts.  Each  Ghostbuster  carries  a  proton 
pack,  which  shoots  a  stream  at  a  ghost,  eradicating  it.  A  stream  goes  in  a  straight 
line  and  terminates  when  it  hits  the  ghost.  The  Ghostbusters  decide  upon  the  fol¬ 
lowing  strategy.  They  will  pair  off  with  the  ghosts,  forming  n  Ghostbuster-ghost 
pairs,  and  then  simultaneously  each  Ghostbuster  will  shoot  a  stream  at  his  cho¬ 
sen  ghost.  As  we  all  know,  it  is  very  dangerous  to  let  streams  cross,  and  so  the 
Ghostbusters  must  choose  pairings  for  which  no  streams  will  cross. 

Assume  that  the  position  of  each  Ghostbuster  and  each  ghost  is  a  fixed  point  in 
the  plane  and  that  no  three  positions  are  colinear. 

a.  Argue  that  there  exists  a  line  passing  through  one  Ghostbuster  and  one  ghost 
such  that  the  number  of  Ghostbusters  on  one  side  of  the  line  equals  the  number 
of  ghosts  on  the  same  side.  Describe  how  to  find  such  a  line  in  0{n  lg  n)  time. 
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b.  Give  an  0(nz  lg  «)-time  algorithm  to  pair  Ghostbusters  with  ghosts  in  such  a 
way  that  no  streams  cross. 

33-4  Picking  up  sticks 

Professor  Charon  has  a  set  of  n  sticks,  which  are  piled  up  in  some  configuration. 
Each  stick  is  specified  by  its  endpoints,  and  each  endpoint  is  an  ordered  triple 
giving  its  (x,  y,z)  coordinates.  No  stick  is  vertical.  He  wishes  to  pick  up  all  the 
sticks,  one  at  a  time,  subject  to  the  condition  that  he  may  pick  up  a  stick  only  if 
there  is  no  other  stick  on  top  of  it. 

a.  Give  a  procedure  that  takes  two  sticks  a  and  b  and  reports  whether  a  is  above, 
below,  or  unrelated  to  b. 

b.  Describe  an  efficient  algorithm  that  determines  whether  it  is  possible  to  pick  up 
all  the  sticks,  and  if  so,  provides  a  legal  order  in  which  to  pick  them  up. 

33-5  Sparse-hulled  distributions 

Consider  the  problem  of  computing  the  convex  hull  of  a  set  of  points  in  the  plane 
that  have  been  drawn  according  to  some  known  random  distribution.  Sometimes, 
the  number  of  points,  or  size,  of  the  convex  hull  of  n  points  drawn  from  such  a 
distribution  has  expectation  0(n1~e)  for  some  constant  e  >  0.  We  call  such  a 
distribution  sparse-hulled.  Sparse-hulled  distributions  include  the  following: 

•  Points  drawn  uniformly  from  a  unit-radius  disk.  The  convex  hull  has  expected 
size  0( n1/3). 

•  Points  drawn  uniformly  from  the  interior  of  a  convex  polygon  with  k  sides,  for 
any  constant  k.  The  convex  hull  has  expected  size  0(lg//). 

•  Points  drawn  according  to  a  two-dimensional  normal  distribution.  The  convex 
hull  has  expected  size  0(^/lg  n). 

a.  Given  two  convex  polygons  with  n\  and  n2  vertices  respectively,  show  how  to 
compute  the  convex  hull  of  all  n  \+n2  points  in  0(n ,  +n2)  time.  (The  polygons 
may  overlap.) 

b.  Show  how  to  compute  the  convex  hull  of  a  set  of  n  points  drawn  independently 
according  to  a  sparse-hulled  distribution  in  0(n)  average-case  time.  {Hint: 
Recursively  find  the  convex  hulls  of  the  first  n/2  points  and  the  second  n /2 
points,  and  then  combine  the  results.) 


Notes  for  Chapter  33 
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Chapter  notes 

This  chapter  barely  scratches  the  surface  of  computational-geometry  algorithms 
and  techniques.  Books  on  computational  geometry  include  those  by  Preparata  and 
Shamos  [282],  Edelsbrunner  [99],  and  O’Rourke  [269]. 

Although  geometry  has  been  studied  since  antiquity,  the  development  of  algo¬ 
rithms  for  geometric  problems  is  relatively  new.  Preparata  and  Shamos  note  that 
the  earliest  notion  of  the  complexity  of  a  problem  was  given  by  E.  Lemoine  in  1902. 
He  was  studying  euclidean  constructions— those  using  a  compass  and  a  ruler— and 
devised  a  set  of  five  primitives:  placing  one  leg  of  the  compass  on  a  given  point, 
placing  one  leg  of  the  compass  on  a  given  line,  drawing  a  circle,  passing  the  ruler’s 
edge  through  a  given  point,  and  drawing  a  line.  Lemoine  was  interested  in  the 
number  of  primitives  needed  to  effect  a  given  construction;  he  called  this  amount 
the  “simplicity”  of  the  construction. 

The  algorithm  of  Section  33.2,  which  determines  whether  any  segments  inter¬ 
sect,  is  due  to  Shamos  and  Hoey  [313]. 

The  original  version  of  Graham’s  scan  is  given  by  Graham  [150].  The  package¬ 
wrapping  algorithm  is  due  to  Jarvis  [189].  Using  a  decision-tree  model  of  com¬ 
putation,  Yao  [359]  proved  a  worst-case  lower  bound  of  Q(n  lg  n)  for  the  running 
time  of  any  convex-hull  algorithm.  When  the  number  of  vertices  h  of  the  con¬ 
vex  hull  is  taken  into  account,  the  prune-and-search  algorithm  of  Kirkpatrick  and 
Seidel  [206],  which  takes  0(n  lg  h)  time,  is  asymptotically  optimal. 

The  0(n  lg «)-time  divide-and-conquer  algorithm  for  finding  the  closest  pair  of 
points  is  by  Shamos  and  appears  in  Preparata  and  Shamos  [282].  Preparata  and 
Shamos  also  show  that  the  algorithm  is  asymptotically  optimal  in  a  decision-tree 
model. 
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Almost  all  the  algorithms  we  have  studied  thus  far  have  been  polynomial-time  al¬ 
gorithms'.  on  inputs  of  size  n,  their  worst-case  running  time  is  0(nk  )  for  some  con¬ 
stant  k.  You  might  wonder  whether  all  problems  can  be  solved  in  polynomial  time. 
The  answer  is  no.  For  example,  there  are  problems,  such  as  Turing’s  famous  “Halt¬ 
ing  Problem,”  that  cannot  be  solved  by  any  computer,  no  matter  how  much  time  we 
allow.  There  are  also  problems  that  can  be  solved,  but  not  in  time  0(nk)  for  any 
constant  k.  Generally,  we  think  of  problems  that  are  solvable  by  polynomial-time 
algorithms  as  being  tractable,  or  easy,  and  problems  that  require  supeipolynomial 
time  as  being  intractable,  or  hard. 

The  subject  of  this  chapter,  however,  is  an  interesting  class  of  problems,  called 
the  “NP-complete”  problems,  whose  status  is  unknown.  No  polynomial-time  al¬ 
gorithm  has  yet  been  discovered  for  an  NP-complete  problem,  nor  has  anyone  yet 
been  able  to  prove  that  no  polynomial-time  algorithm  can  exist  for  any  one  of  them. 
This  so-called  P  ^  NP  question  has  been  one  of  the  deepest,  most  perplexing  open 
research  problems  in  theoretical  computer  science  since  it  was  first  posed  in  1971. 

Several  NP-complete  problems  are  particularly  tantalizing  because  they  seem 
on  the  surface  to  be  similar  to  problems  that  we  know  how  to  solve  in  polynomial 
time.  In  each  of  the  following  pairs  of  problems,  one  is  solvable  in  polynomial 
time  and  the  other  is  NP-complete,  but  the  difference  between  problems  appears  to 
be  slight: 

Shortest  vs.  longest  simple  paths:  In  Chapter  24,  we  saw  that  even  with  negative 
edge  weights,  we  can  find  shortest  paths  from  a  single  source  in  a  directed 
graph  G  =  (V,  E)  in  0(VE)  time.  Finding  a  longest  simple  path  between  two 
vertices  is  difficult,  however.  Merely  determining  whether  a  graph  contains  a 
simple  path  with  at  least  a  given  number  of  edges  is  NP-complete. 

Euler  tour  vs.  hamiltonian  cycle:  An  Euler  tour  of  a  connected,  directed  graph 
G  =  (V,  E)  is  a  cycle  that  traverses  each  edge  of  G  exactly  once,  although 
it  is  allowed  to  visit  each  vertex  more  than  once.  By  Problem  22-3,  we  can 
determine  whether  a  graph  has  an  Euler  tour  in  only  0(E)  time  and,  in  fact, 
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we  can  find  the  edges  of  the  Euler  tour  in  0(E)  time.  A  hamiltonian  cycle  of 
a  directed  graph  G  =  (V,  E)  is  a  simple  cycle  that  contains  each  vertex  in  V. 
Determining  whether  a  directed  graph  has  a  hamiltonian  cycle  is  NP-complete. 
(Later  in  this  chapter,  we  shall  prove  that  determining  whether  an  undirected 
graph  has  a  hamiltonian  cycle  is  NP-complete.) 

2-CNF  satisfiability  vs.  3-CNF  satisfiability:  A  boolean  formula  contains  vari¬ 
ables  whose  values  are  0  or  1;  boolean  connectives  such  as  A  (AND),  V  (OR), 
and  ->  (NOT);  and  parentheses.  A  boolean  formula  is  satisfiable  if  there  exists 
some  assignment  of  the  values  0  and  1  to  its  variables  that  causes  it  to  evaluate 
to  1 .  We  shall  define  terms  more  formally  later  in  this  chapter,  but  informally,  a 
boolean  formula  is  in  k -conjunctive  normal  form,  or  A: -CNF,  if  it  is  the  AND 
of  clauses  of  ORs  of  exactly  k  variables  or  their  negations.  For  example,  the 
boolean  formula  (x\  V  ~’X2)  A  (-,x1  V  x3)  A  ( ->x2  V  -,x3)  is  in  2-CNF.  (It  has 
the  satisfying  assignment  X\  =  l,x2  =  0, x3  =  1.)  Although  we  can  deter¬ 
mine  in  polynomial  time  whether  a  2-CNF  formula  is  satisfiable,  we  shall  see 
later  in  this  chapter  that  determining  whether  a  3-CNF  formula  is  satisfiable  is 
NP-complete. 

NP-completeness  and  the  classes  P  and  NP 

Throughout  this  chapter,  we  shall  refer  to  three  classes  of  problems:  P,  NP,  and 
NPC,  the  latter  class  being  the  NP-complete  problems.  We  describe  them  infor¬ 
mally  here,  and  we  shall  define  them  more  formally  later  on. 

The  class  P  consists  of  those  problems  that  are  solvable  in  polynomial  time. 
More  specifically,  they  are  problems  that  can  be  solved  in  time  0(nk)  for  some 
constant  k,  where  n  is  the  size  of  the  input  to  the  problem.  Most  of  the  problems 
examined  in  previous  chapters  are  in  P. 

The  class  NP  consists  of  those  problems  that  are  “verifiable”  in  polynomial  time. 
What  do  we  mean  by  a  problem  being  verifiable?  If  we  were  somehow  given  a 
“certificate”  of  a  solution,  then  we  could  verify  that  the  certificate  is  correct  in  time 
polynomial  in  the  size  of  the  input  to  the  problem.  For  example,  in  the  hamiltonian- 
cycle  problem,  given  a  directed  graph  G  =  (V,  E),  a  certificate  would  be  a  se¬ 
quence  (vi,  v2,  v3, . . . ,  v\V\)  of  |  V\  vertices.  We  could  easily  check  in  polynomial 
time  that  (v, ,  v,+1)  e  E  for  i  =  1, 2,  3, . . . ,  |  V\  —  1  and  that  (v\v\,  v3)  e  E  as  well. 
As  another  example,  for  3-CNF  satisfiability,  a  certificate  would  be  an  assignment 
of  values  to  variables.  We  could  check  in  polynomial  time  that  this  assignment 
satisfies  the  boolean  formula. 

Any  problem  in  P  is  also  in  NP,  since  if  a  problem  is  in  P  then  we  can  solve  it 
in  polynomial  time  without  even  being  supplied  a  certificate.  We  shall  formalize 
this  notion  later  in  this  chapter,  but  for  now  we  can  believe  that  P  c  NP.  The  open 
question  is  whether  or  not  P  is  a  proper  subset  of  NP. 


1050 


Chapter  34  NP  Completeness 


Informally,  a  problem  is  in  the  class  NPC— and  we  refer  to  it  as  being  NP- 
complete— if  it  is  in  NP  and  is  as  “hard”  as  any  problem  in  NP.  We  shall  formally 
define  what  it  means  to  be  as  hard  as  any  problem  in  NP  later  in  this  chapter. 
In  the  meantime,  we  will  state  without  proof  that  if  any  NP-complete  problem 
can  be  solved  in  polynomial  time,  then  every  problem  in  NP  has  a  polynomial¬ 
time  algorithm.  Most  theoretical  computer  scientists  believe  that  the  NP-complete 
problems  are  intractable,  since  given  the  wide  range  of  NP-complete  problems 
that  have  been  studied  to  date— without  anyone  having  discovered  a  polynomial¬ 
time  solution  to  any  of  them— it  would  be  truly  astounding  if  all  of  them  could 
be  solved  in  polynomial  time.  Yet,  given  the  effort  devoted  thus  far-  to  proving 
that  NP-complete  problems  are  intractable— without  a  conclusive  outcome— we 
cannot  rule  out  the  possibility  that  the  NP-complete  problems  are  in  fact  solvable 
in  polynomial  time. 

To  become  a  good  algorithm  designer,  you  must  understand  the  rudiments  of  the 
theory  of  NP-completeness.  If  you  can  establish  a  problem  as  NP-complete,  you 
provide  good  evidence  for  its  intractability.  As  an  engineer,  you  would  then  do 
better  to  spend  your  time  developing  an  approximation  algorithm  (see  Chapter  35) 
or  solving  a  tractable  special  case,  rather  than  searching  for  a  fast  algorithm  that 
solves  the  problem  exactly.  Moreover,  many  natural  and  interesting  problems  that 
on  the  surface  seem  no  harder  than  sorting,  graph  searching,  or  network  flow  are 
in  fact  NP-complete.  Therefore,  you  should  become  familiar-  with  this  remarkable 
class  of  problems. 

Overview  of  showing  problems  to  be  NP-complete 

The  techniques  we  use  to  show  that  a  particular  problem  is  NP-complete  differ 
fundamentally  from  the  techniques  used  throughout  most  of  this  book  to  design 
and  analyze  algorithms.  When  we  demonstrate  that  a  problem  is  NP-complete, 
we  are  making  a  statement  about  how  hard  it  is  (or  at  least  how  hard  we  think  it 
is),  rather  than  about  how  easy  it  is.  We  are  not  trying  to  prove  the  existence  of 
an  efficient  algorithm,  but  instead  that  no  efficient  algorithm  is  likely  to  exist.  In 
this  way,  NP-completeness  proofs  bear  some  similarity  to  the  proof  in  Section  8. 1 
of  an  £2(/?  lg  n)-time  lower  bound  for  any  comparison  sort  algorithm;  the  specific 
techniques  used  for  showing  NP-completeness  differ  from  the  decision-tree  method 
used  in  Section  8.1,  however. 

We  rely  on  three  key  concepts  in  showing  a  problem  to  be  NP-complete: 
Decision  problems  vs.  optimization  problems 

Many  problems  of  interest  are  optimization  problems,  in  which  each  feasible  (i.e., 
“legal”)  solution  has  an  associated  value,  and  we  wish  to  find  a  feasible  solution 
with  the  best  value.  For  example,  in  a  problem  that  we  call  SHORTEST-PATH, 
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we  are  given  an  undirected  graph  G  and  vertices  u  and  v,  and  we  wish  to  find  a 
path  from  u  to  v  that  uses  the  fewest  edges.  In  other  words,  SHORTEST-PATH 
is  the  single -pair  shortest-path  problem  in  an  unweighted,  undirected  graph.  NP- 
completeness  applies  directly  not  to  optimization  problems,  however,  but  to  deci¬ 
sion  problems ,  in  which  the  answer  is  simply  “yes”  or  “no”  (or,  more  formally,  “1” 
or  “0”). 

Although  NP-complete  problems  are  confined  to  the  realm  of  decision  problems, 
we  can  take  advantage  of  a  convenient  relationship  between  optimization  problems 
and  decision  problems.  We  usually  can  cast  a  given  optimization  problem  as  a  re¬ 
lated  decision  problem  by  imposing  a  bound  on  the  value  to  be  optimized.  For 
example,  a  decision  problem  related  to  SHORTEST-PATH  is  PATH:  given  a  di¬ 
rected  graph  G,  vertices  u  and  v,  and  an  integer  k,  does  a  path  exist  from  u  to  v 
consisting  of  at  most  k  edges? 

The  relationship  between  an  optimization  problem  and  its  related  decision  prob¬ 
lem  works  in  our  favor  when  we  try  to  show  that  the  optimization  problem  is 
“hard.”  That  is  because  the  decision  problem  is  in  a  sense  “easier,”  or  at  least  “no 
harder.”  As  a  specific  example,  we  can  solve  PATH  by  solving  SHORTEST-PATH 
and  then  comparing  the  number  of  edges  in  the  shortest  path  found  to  the  value 
of  the  decision-problem  parameter  k.  In  other  words,  if  an  optimization  prob¬ 
lem  is  easy,  its  related  decision  problem  is  easy  as  well.  Stated  in  a  way  that  has 
more  relevance  to  NP -completeness,  if  we  can  provide  evidence  that  a  decision 
problem  is  hard,  we  also  provide  evidence  that  its  related  optimization  problem  is 
hard.  Thus,  even  though  it  restricts  attention  to  decision  problems,  the  theory  of 
NP-completeness  often  has  implications  for  optimization  problems  as  well. 

Reductions 

The  above  notion  of  showing  that  one  problem  is  no  harder  or  no  easier  than  an¬ 
other  applies  even  when  both  problems  are  decision  problems.  We  take  advantage 
of  this  idea  in  almost  every  NP-completeness  proof,  as  follows.  Let  us  consider  a 
decision  problem  A,  which  we  would  like  to  solve  in  polynomial  time.  We  call  the 
input  to  a  particular  problem  an  instance  of  that  problem;  for  example,  in  PATH, 
an  instance  would  be  a  particular  graph  G,  particular-  vertices  u  and  v  of  G,  and  a 
particular  integer  k.  Now  suppose  that  we  already  know  how  to  solve  a  different 
decision  problem  B  in  polynomial  time.  Finally,  suppose  that  we  have  a  procedure 
that  transforms  any  instance  a  of  A  into  some  instance  /I  of  B  with  the  following 
characteristics: 

•  The  transformation  takes  polynomial  time. 

•  The  answers  are  the  same.  That  is,  the  answer  for  a  is  “yes”  if  and  only  if  the 
answer  for  is  also  “yes.” 
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Figure  34.1  How  to  use  a  polynomial  time  reduction  algorithm  to  solve  a  decision  problem  A  in 
polynomial  time,  given  a  polynomial  time  decision  algorithm  for  another  problem  B.  In  polynomial 
time,  we  transform  an  instance  a  of  A  into  an  instance  ft  of  B,  we  solve  B  in  polynomial  time,  and 
we  use  the  answer  for  ft  as  the  answer  for  a. 

We  call  such  a  procedure  a  polynomial-time  reduction  algorithm  and,  as  Fig¬ 
ure  34.1  shows,  it  provides  us  a  way  to  solve  problem  A  in  polynomial  time: 

1.  Given  an  instance  a  of  problem  A,  use  a  polynomial-time  reduction  algorithm 
to  transform  it  to  an  instance  ft  of  problem  B. 

2.  Run  the  polynomial-time  decision  algorithm  for  B  on  the  instance  ft. 

3.  Use  the  answer  for  ft  as  the  answer  for  a. 

As  long  as  each  of  these  steps  takes  polynomial  time,  all  three  together  do  also,  and 
so  we  have  a  way  to  decide  on  a  in  polynomial  time.  In  other  words,  by  “reducing” 
solving  problem  A  to  solving  problem  B,  we  use  the  “easiness”  of  B  to  prove  the 
“easiness”  of  A. 

Recalling  that  NP-completeness  is  about  showing  how  hard  a  problem  is  rather 
than  how  easy  it  is,  we  use  polynomial-time  reductions  in  the  opposite  way  to  show 
that  a  problem  is  NP-complete.  Let  us  take  the  idea  a  step  further,  and  show  how  we 
could  use  polynomial-time  reductions  to  show  that  no  polynomial-time  algorithm 
can  exist  for  a  particular  problem  B.  Suppose  we  have  a  decision  problem  A  for 
which  we  already  know  that  no  polynomial-time  algorithm  can  exist.  (Let  us  not 
concern  ourselves  for  now  with  how  to  find  such  a  problem  A.)  Suppose  further 
that  we  have  a  polynomial-time  reduction  transforming  instances  of  A  to  instances 
of  B.  Now  we  can  use  a  simple  proof  by  contradiction  to  show  that  no  polynomial¬ 
time  algorithm  can  exist  for  B.  Suppose  otherwise;  i.e.,  suppose  that  B  has  a 
polynomial-time  algorithm.  Then,  using  the  method  shown  in  Figure  34.1,  we 
would  have  a  way  to  solve  problem  A  in  polynomial  time,  which  contradicts  our 
assumption  that  there  is  no  polynomial-time  algorithm  for  A. 

For  NP-completeness,  we  cannot  assume  that  there  is  absolutely  no  polynomial¬ 
time  algorithm  for  problem  A.  The  proof  methodology  is  similar,  however,  in  that 
we  prove  that  problem  B  is  NP-complete  on  the  assumption  that  problem  A  is  also 
NP-complete. 
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A  first  NP -complete  problem 

Because  the  technique  of  reduction  relies  on  having  a  problem  already  known  to 
be  NP-complete  in  order  to  prove  a  different  problem  NP-complete,  we  need  a 
“first”  NP-complete  problem.  The  problem  we  shall  use  is  the  circuit-satisfiability 
problem,  in  which  we  are  given  a  boolean  combinational  circuit  composed  of  AND, 
OR,  and  NOT  gates,  and  we  wish  to  know  whether  there  exists  some  set  of  boolean 
inputs  to  this  circuit  that  causes  its  output  to  be  1.  We  shall  prove  that  this  first 
problem  is  NP-complete  in  Section  34.3. 

Chapter  outline 

This  chapter  studies  the  aspects  of  NP-completeness  that  bear  most  directly  on  the 
analysis  of  algorithms.  In  Section  34.1,  we  formalize  our  notion  of  “problem”  and 
define  the  complexity  class  P  of  polynomial-time  solvable  decision  problems.  We 
also  see  how  these  notions  tit  into  the  framework  of  formal-language  theory.  Sec¬ 
tion  34.2  defines  the  class  NP  of  decision  problems  whose  solutions  are  verifiable 
in  polynomial  time.  It  also  formally  poses  the  P  NP  question. 

Section  34.3  shows  we  can  relate  problems  via  polynomial-time  “reductions.” 
It  defines  NP-completeness  and  sketches  a  proof  that  one  problem,  called  “circuit 
satisfiability,”  is  NP-complete.  Having  found  one  NP-complete  problem,  we  show 
in  Section  34.4  how  to  prove  other  problems  to  be  NP-complete  much  more  simply 
by  the  methodology  of  reductions.  We  illustrate  this  methodology  by  showing  that 
two  formula-satisfiability  problems  are  NP-complete.  With  additional  reductions, 
we  show  in  Section  34.5  a  variety  of  other  problems  to  be  NP-complete. 


34.1  Polynomial  time 

We  begin  our  study  of  NP-completeness  by  formalizing  our  notion  of  polynomial¬ 
time  solvable  problems.  We  generally  regal'd  these  problems  as  tractable,  but  for 
philosophical,  not  mathematical,  reasons.  We  can  offer  three  supporting  argu¬ 
ments. 

First,  although  we  may  reasonably  regal'd  a  problem  that  requires  time  0(»100) 
to  be  intractable,  very  few  practical  problems  require  time  on  the  order  of  such  a 
high-degree  polynomial.  The  polynomial-time  computable  problems  encountered 
in  practice  typically  require  much  less  time.  Experience  has  shown  that  once  the 
first  polynomial-time  algorithm  for  a  problem  has  been  discovered,  more  efficient 
algorithms  often  follow.  Even  if  the  current  best  algorithm  for  a  problem  has  a 
running  time  of  ®(n 1 00 ),  an  algorithm  with  a  much  better  running  time  will  likely 
soon  be  discovered. 
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Second,  for  many  reasonable  models  of  computation,  a  problem  that  can  be 
solved  in  polynomial  time  in  one  model  can  be  solved  in  polynomial  time  in  an¬ 
other.  For  example,  the  class  of  problems  solvable  in  polynomial  time  by  the  serial 
random-access  machine  used  throughout  most  of  this  book  is  the  same  as  the  class 
of  problems  solvable  in  polynomial  time  on  abstract  Turing  machines.1  It  is  also 
the  same  as  the  class  of  problems  solvable  in  polynomial  time  on  a  parallel  com¬ 
puter  when  the  number  of  processors  grows  polynomially  with  the  input  size. 

Third,  the  class  of  polynomial-time  solvable  problems  has  nice  closure  proper¬ 
ties,  since  polynomials  are  closed  under  addition,  multiplication,  and  composition. 
For  example,  if  the  output  of  one  polynomial-time  algorithm  is  fed  into  the  input  of 
another,  the  composite  algorithm  is  polynomial.  Exercise  34.1-5  asks  you  to  show 
that  if  an  algorithm  makes  a  constant  number  of  calls  to  polynomial-time  subrou¬ 
tines  and  performs  an  additional  amount  of  work  that  also  takes  polynomial  time, 
then  the  running  time  of  the  composite  algorithm  is  polynomial. 

Abstract  problems 

To  understand  the  class  of  polynomial-time  solvable  problems,  we  must  first  have 
a  formal  notion  of  what  a  “problem”  is.  We  define  an  abstract  problem  Q  to  be  a 
binary  relation  on  a  set  /  of  problem  instances  and  a  set  S  of  problem  solutions. 
For  example,  an  instance  for  SHORTEST-PATH  is  a  triple  consisting  of  a  graph 
and  two  vertices.  A  solution  is  a  sequence  of  vertices  in  the  graph,  with  perhaps 
the  empty  sequence  denoting  that  no  path  exists.  The  problem  SHORTEST-PATH 
itself  is  the  relation  that  associates  each  instance  of  a  graph  and  two  vertices  with 
a  shortest  path  in  the  graph  that  connects  the  two  vertices.  Since  shortest  paths  are 
not  necessarily  unique,  a  given  problem  instance  may  have  more  than  one  solution. 

This  formulation  of  an  abstract  problem  is  more  general  than  we  need  for  our 
puiposes.  As  we  saw  above,  the  theory  of  NP-completeness  restricts  attention  to 
decision  problems',  those  having  a  yes/no  solution.  In  this  case,  we  can  view  an 
abstract  decision  problem  as  a  function  that  maps  the  instance  set  I  to  the  solution 
set  {0,  1}.  For  example,  a  decision  problem  related  to  SHORTEST-PATH  is  the 
problem  PATH  that  we  saw  earlier.  Iff  =  (G,u,v,k)  is  an  instance  of  the  decision 
problem  PATH,  then  PATH(f)  =  1  (yes)  if  a  shortest  path  from  a  to  v  has  at 
most  k  edges,  and  PATH(f)  =  0  (no)  otherwise.  Many  abstract  problems  are  not 
decision  problems,  but  rather  optimization  problems,  which  require  some  value  to 
be  minimized  or  maximized.  As  we  saw  above,  however,  we  can  usually  recast  an 
optimization  problem  as  a  decision  problem  that  is  no  harder. 


^ee  Hopcroft  and  Ullman  [180]  or  Lewis  and  Papadimitriou  [236]  for  a  thorough  treatment  of  the 
Turing  machine  model. 
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Encodings 

In  order  for  a  computer  program  to  solve  an  abstract  problem,  we  must  represent 
problem  instances  in  a  way  that  the  program  understands.  An  encoding  of  a  set  S 
of  abstract  objects  is  a  mapping  e  from  S  to  the  set  of  binary  strings.2  For  example, 
we  are  all  familiar  with  encoding  the  natural  numbers  N  =  {0, 1, 2,  3,  4, . . .}  as 
the  strings  {0, 1,  10,  11,  100, . . .}.  Using  this  encoding,  e(17)  =  10001.  If  you 
have  looked  at  computer  representations  of  keyboard  characters,  you  probably  have 
seen  the  ASCII  code,  where,  for  example,  the  encoding  of  A  is  1000001.  We  can 
encode  a  compound  object  as  a  binary  string  by  combining  the  representations  of 
its  constituent  parts.  Polygons,  graphs,  functions,  ordered  pairs,  programs— all  can 
be  encoded  as  binary  strings. 

Thus,  a  computer  algorithm  that  “solves”  some  abstract  decision  problem  actu¬ 
ally  takes  an  encoding  of  a  problem  instance  as  input.  We  call  a  problem  whose 
instance  set  is  the  set  of  binary  strings  a  concrete  problem.  We  say  that  an  algo¬ 
rithm  solves  a  concrete  problem  in  time  0{T(n))  if,  when  it  is  provided  a  problem 
instance  i  of  length  n  =  |z|,  the  algorithm  can  produce  the  solution  in  0(T(n )) 
time.3  A  concrete  problem  is  polynomial-time  solvable ,  therefore,  if  there  exists 
an  algorithm  to  solve  it  in  time  0(nk )  for  some  constant  k. 

We  can  now  formally  define  the  complexity  class  P  as  the  set  of  concrete  deci¬ 
sion  problems  that  are  polynomial-time  solvable. 

We  can  use  encodings  to  map  abstract  problems  to  concrete  problems.  Given 
an  abstract  decision  problem  Q  mapping  an  instance  set  /  to  {0,  1},  an  encoding 
e  :  I  — >  {0,  1}*  can  induce  a  related  concrete  decision  problem,  which  we  denote 
by  e(Q).4  If  the  solution  to  an  abstract-problem  instance  i  €  I  is  Q(i)  €  {0, 1}, 
then  the  solution  to  the  concrete-problem  instance  e(i)  e  {0, 1}*  is  also  Q(i).  Asa 
technicality,  some  binary  strings  might  represent  no  meaningful  abstract-problem 
instance.  For  convenience,  we  shah  assume  that  any  such  string  maps  arbitrarily 
to  0.  Thus,  the  concrete  problem  produces  the  same  solutions  as  the  abstract  prob¬ 
lem  on  binary-string  instances  that  represent  the  encodings  of  abstract-problem 
instances. 

We  would  like  to  extend  the  definition  of  polynomial-time  solvability  from  con¬ 
crete  problems  to  abstract  problems  by  using  encodings  as  the  bridge,  but  we  would 


2The  codomain  of  e  need  not  be  binary  strings;  any  set  of  strings  over  a  finite  alphabet  having  at 
least  2  symbols  will  do. 

3We  assume  that  the  algorithm’s  output  is  separate  from  its  input.  Because  it  takes  at  least  one  time 
step  to  produce  each  bit  of  the  output  and  the  algorithm  takes  0(T(n))  time  steps,  the  size  of  the 
output  is  0(T («)). 

4We  denote  by  {0,  1}*  the  set  of  all  strings  composed  of  symbols  from  the  set  {0, 1}. 
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like  the  definition  to  be  independent  of  any  particular  encoding.  That  is,  the  ef¬ 
ficiency  of  solving  a  problem  should  not  depend  on  how  the  problem  is  encoded. 
Unfortunately,  it  depends  quite  heavily  on  the  encoding.  For  example,  suppose  that 
an  integer  k  is  to  be  provided  as  the  sole  input  to  an  algorithm,  and  suppose  that 
the  running  time  of  the  algorithm  is  (~)(k).  If  the  integer  k  is  provided  in  unary— a 
string  of  k  Is— then  the  running  time  of  the  algorithm  is  0(n)  on  length-/?  inputs, 
which  is  polynomial  time.  If  we  use  the  more  natural  binary  representation  of  the 
integer  k,  however,  then  the  input  length  is  n  =  |_lg^J  +  1-  In  this  case,  the  run¬ 
ning  time  of  the  algorithm  is  &(k )  =  0(2"),  which  is  exponential  in  the  size  of  the 
input.  Thus,  depending  on  the  encoding,  the  algorithm  runs  in  either  polynomial 
or  superpolynomial  time. 

How  we  encode  an  abstract  problem  matters  quite  a  bit  to  how  we  understand 
polynomial  time.  We  cannot  really  talk  about  solving  an  abstract  problem  without 
first  specifying  an  encoding.  Nevertheless,  in  practice,  if  we  rule  out  “expensive” 
encodings  such  as  unary  ones,  the  actual  encoding  of  a  problem  makes  little  dif¬ 
ference  to  whether  the  problem  can  be  solved  in  polynomial  time.  For  example, 
representing  integers  in  base  3  instead  of  binary  has  no  effect  on  whether  a  prob¬ 
lem  is  solvable  in  polynomial  time,  since  we  can  convert  an  integer  represented  in 
base  3  to  an  integer  represented  in  base  2  in  polynomial  time. 

We  say  that  a  function  /  :  {0, 1}*  — >•  {0,  1}*  is  polynomial-time  computable 
if  there  exists  a  polynomial-time  algorithm  A  that,  given  any  input  x  e  {0,  1}*, 
produces  as  output  f(x).  For  some  set  I  of  problem  instances,  we  say  that  two  en¬ 
codings  e\  and  e2  are  polynomiatiy  related  if  there  exist  two  polynomial-time  com¬ 
putable  functions  /12  and  /2i  such  that  for  any  i  €  I ,  we  have  f\2(e\  (/))  =  e2(i ) 
and  f2\{e2{i))  =  ei  (i)-5  That  is,  a  polynomial-time  algorithm  can  compute  the  en¬ 
coding  e2(i)  from  the  encoding  <?,  (i),  and  vice  versa.  If  two  encodings  <?,  and  e2  of 
an  abstract  problem  are  polynomially  related,  whether  the  problem  is  polynomial¬ 
time  solvable  or  not  is  independent  of  which  encoding  we  use,  as  the  following 
lemma  shows. 

Lemma  34.1 

Let  Q  be  an  abstract  decision  problem  on  an  instance  set  /,  and  let  e,  and  e2  be 
polynomially  related  encodings  on  7.  Then,  e\(Q)  €  P  if  and  only  i f  e2 ( Q )  e  P. 


technically,  we  also  require  the  functions  f\2  and  f2\  to  “map  noninstances  to  noninstances.” 
A  noninstance  of  an  encoding  e  is  a  string  x  6  {0.  1}*  such  that  there  is  no  instance  i  for  which 
e(i)  =  x.  We  require  that  fi2(x)  =  y  for  every  noninstance  x  of  encoding  ej,  where  y  is  some  non 
instance  of  e2 ,  and  that  f2\ ( x ')  =  y'  for  every  noninstance  x'  of  e2 ,  where  y'  is  some  noninstance 
of  e\. 
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Proof  We  need  only  prove  the  forward  direction,  since  the  backward  direction  is 
symmetric.  Suppose,  therefore,  that  efQ)  can  be  solved  in  time  0(nk)  for  some 
constant  k.  Further,  suppose  that  for  any  problem  instance  i,  the  encoding  et  (i ) 
can  be  computed  from  the  encoding  e2(i )  in  time  0(nc)  for  some  constant  c,  where 
n  =  |e2(OI-  To  solve  problem  e2(Q),  on  input  e2(i),  we  first  compute  e.\  (i)  and 
then  run  the  algorithm  for  e\ (Q)  on  efi).  How  long  does  this  take?  Converting 
encodings  takes  time  0(nc),  and  therefore  \e\ (j)|  =  0(nc),  since  the  output  of 
a  serial  computer  cannot  be  longer  than  its  running  time.  Solving  the  problem 
on  e\  (i)  takes  time  0(\e\{i)\k)  =  0(nck),  which  is  polynomial  since  both  c  and  k 
are  constants.  ■ 

Thus,  whether  an  abstract  problem  has  its  instances  encoded  in  binary  or  base  3 
does  not  affect  its  “complexity,”  that  is,  whether  it  is  polynomial-time  solvable  or 
not;  but  if  instances  are  encoded  in  unary,  its  complexity  may  change.  In  order  to 
be  able  to  converse  in  an  encoding-independent  fashion,  we  shall  generally  assume 
that  problem  instances  are  encoded  in  any  reasonable,  concise  fashion,  unless  we 
specifically  say  otherwise.  To  be  precise,  we  shall  assume  that  the  encoding  of  an 
integer  is  polynomially  related  to  its  binary  representation,  and  that  the  encoding  of 
a  finite  set  is  polynomially  related  to  its  encoding  as  a  list  of  its  elements,  enclosed 
in  braces  and  separated  by  commas.  (ASCII  is  one  such  encoding  scheme.)  With 
such  a  “standard”  encoding  in  hand,  we  can  derive  reasonable  encodings  of  other 
mathematical  objects,  such  as  tuples,  graphs,  and  formulas.  To  denote  the  standard 
encoding  of  an  object,  we  shall  enclose  the  object  in  angle  braces.  Thus,  (G) 
denotes  the  standard  encoding  of  a  graph  G. 

As  long  as  we  implicitly  use  an  encoding  that  is  polynomially  related  to  this 
standard  encoding,  we  can  talk  directly  about  abstract  problems  without  reference 
to  any  particular  encoding,  knowing  that  the  choice  of  encoding  has  no  effect  on 
whether  the  abstract  problem  is  polynomial-time  solvable.  Henceforth,  we  shall 
generally  assume  that  all  problem  instances  are  binary  strings  encoded  using  the 
standard  encoding,  unless  we  explicitly  specify  the  contrary.  We  shall  also  typically 
neglect  the  distinction  between  abstract  and  concrete  problems.  You  should  watch 
out  for  problems  that  arise  in  practice,  however,  in  which  a  standard  encoding  is 
not  obvious  and  the  encoding  does  make  a  difference. 

A  formal-language  framework 

By  focusing  on  decision  problems,  we  can  take  advantage  of  the  machinery  of 
formal-language  theory.  Let’s  review  some  definitions  from  that  theory.  An 
alphabet  E  is  a  finite  set  of  symbols.  A  language  L  over  E  is  any  set  of 
strings  made  up  of  symbols  from  E.  For  example,  if  E  =  {0, 1},  the  set 
L  =  {10,  11,  101,  111,  1011,  1101,  10001, . . .}  is  the  language  of  binary  represen- 
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tations  of  prime  numbers.  We  denote  the  empty  string  by  e,  the  empty  language 
by  0,  and  the  language  of  all  strings  over  X  by  X*.  For  example,  if  X  =  {0,  1}, 
then  X*  =  {£,0,  1,00,01,  10,  11,000, . . .}  is  the  set  of  all  binary  strings.  Every 
language  L  over  X  is  a  subset  of  X*. 

We  can  perform  a  variety  of  operations  on  languages.  Set-theoretic  operations, 
such  as  union  and  intersection,  follow  directly  from  the  set-theoretic  definitions. 
We  define  the  complement  of  L  by  L  =  X*  —  L.  The  concatenation  L  t  L2  of  two 
languages  Lx  and  L2  is  the  language 

L  —  {x\x2  :  Xi  €  L\  and  x2  e  L2}  . 

The  closure  or  Kleene  star  of  a  language  L  is  the  language 

L*  =  {s}ULUL2UL3U  -  , 

where  Lk  is  the  language  obtained  by  concatenating  L  to  itself  k  times. 

From  the  point  of  view  of  language  theory,  the  set  of  instances  for  any  decision 
problem  Q  is  simply  the  set  X*,  where  X  =  {0,  1}.  Since  Q  is  entirely  character¬ 
ized  by  those  problem  instances  that  produce  a  1  (yes)  answer,  we  can  view  Q  as 
a  language  L  over  X  =  {0,  1},  where 

L  =  {x  e  X*  :  Q(x)  =  1}  . 

For  example,  the  decision  problem  PATH  has  the  corresponding  language 

PATH  =  {{G,  u,  v,k)  :  G  =  (V,  E )  is  an  undirected  graph, 
u ,  v  s  V, 

k  >  0  is  an  integer,  and 

there  exists  a  path  from  u  to  v  in  G 

consisting  of  at  most  k  edges}  . 

(Where  convenient,  we  shall  sometimes  use  the  same  name— PATH  in  this  case— 
to  refer  to  both  a  decision  problem  and  its  corresponding  language.) 

The  formal-language  framework  allows  us  to  express  concisely  the  relation  be¬ 
tween  decision  problems  and  algorithms  that  solve  them.  We  say  that  an  al¬ 
gorithm  A  accepts  a  string  x  €  {0, 1}*  if,  given  input  x,  the  algorithm’s  out¬ 
put  A(x)  is  1.  The  language  accepted  by  an  algorithm  A  is  the  set  of  strings 
L  =  {x  €  {0, 1}*  :  A(x)  =  1},  that  is,  the  set  of  strings  that  the  algorithm  accepts. 
An  algorithm  A  rejects  a  string  x  if  A(x)  =  0. 

Even  if  language  L  is  accepted  by  an  algorithm  A,  the  algorithm  will  not  neces¬ 
sarily  reject  a  string  x  $  L  provided  as  input  to  it.  For  example,  the  algorithm  may 
loop  forever.  A  language  L  is  decided  by  an  algorithm  A  if  every  binary  string 
in  L  is  accepted  by  A  and  every  binary  string  not  in  L  is  rejected  by  A.  A  lan¬ 
guage  L  is  accepted  in  polynomial  time  by  an  algorithm  A  if  it  is  accepted  by  A 
and  if  in  addition  there  exists  a  constant  k  such  that  for  any  length-77  string  x  e  L, 
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algorithm  A  accepts  x  in  time  0(nk).  A  language  L  is  decided  in  polynomial 
time  by  an  algorithm  A  if  there  exists  a  constant  k  such  that  for  any  length-//  string 
x  €  {0,  1}*,  the  algorithm  correctly  decides  whether  x  €  L  in  time  0(nk).  Thus, 
to  accept  a  language,  an  algorithm  need  only  produce  an  answer  when  provided  a 
string  in  L,  but  to  decide  a  language,  it  must  correctly  accept  or  reject  every  string 
in  {0,1}*. 

As  an  example,  the  language  PATH  can  be  accepted  in  polynomial  time.  One 
polynomial-time  accepting  algorithm  verifies  that  G  encodes  an  undirected  graph, 
verifies  that  u  and  v  are  vertices  in  G,  uses  breadth-first  search  to  compute  a  short¬ 
est  path  from  u  to  v  in  G,  and  then  compares  the  number  of  edges  on  the  shortest 
path  obtained  with  k.  If  G  encodes  an  undirected  graph  and  the  path  found  from  u 
to  v  has  at  most  k  edges,  the  algorithm  outputs  1  and  halts.  Otherwise,  the  algo¬ 
rithm  runs  forever.  This  algorithm  does  not  decide  PATH,  however,  since  it  does 
not  explicitly  output  0  for  instances  in  which  a  shortest  path  has  more  than  k  edges. 
A  decision  algorithm  for  PATH  must  explicitly  reject  binary  strings  that  do  not  be¬ 
long  to  PATH.  For  a  decision  problem  such  as  PATH,  such  a  decision  algorithm  is 
easy  to  design:  instead  of  running  forever  when  there  is  not  a  path  from  u  to  v  with 
at  most  k  edges,  it  outputs  0  and  halts.  (It  must  also  output  0  and  halt  if  the  input 
encoding  is  faulty.)  For  other  problems,  such  as  Turing’s  Halting  Problem,  there 
exists  an  accepting  algorithm,  but  no  decision  algorithm  exists. 

We  can  informally  define  a  complexity  class  as  a  set  of  languages,  membership 
in  which  is  determined  by  a  complexity  measure,  such  as  running  time,  of  an 
algorithm  that  determines  whether  a  given  string  x  belongs  to  language  L.  The 
actual  definition  of  a  complexity  class  is  somewhat  more  technical.6 

Using  this  language-theoretic  framework,  we  can  provide  an  alternative  defini¬ 
tion  of  the  complexity  class  P: 

P  =  {LC{0,1}*:  there  exists  an  algorithm  A  that  decides  L 
in  polynomial  time}  . 

In  fact,  P  is  also  the  class  of  languages  that  can  be  accepted  in  polynomial  time. 
Theorem  34.2 

P  =  {L  :  L  is  accepted  by  a  polynomial-time  algorithm}  . 

Proof  Because  the  class  of  languages  decided  by  polynomial-time  algorithms  is 
a  subset  of  the  class  of  languages  accepted  by  polynomial-time  algorithms,  we 
need  only  show  that  if  L  is  accepted  by  a  polynomial-time  algorithm,  it  is  de¬ 
cided  by  a  polynomial-time  algorithm.  Let  L  be  the  language  accepted  by  some 


6For  more  on  complexity  classes,  see  the  seminal  paper  by  Hartmanis  and  Steams  [162], 
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polynomial-time  algorithm  A.  We  shall  use  a  classic  “simulation”  argument  to 
construct  another  polynomial-time  algorithm  A!  that  decides  L.  Because  A  ac¬ 
cepts  L  in  time  0(nk)  for  some  constant  k,  there  also  exists  a  constant  c  such 
that  A  accepts  L  in  at  most  cnk  steps.  For  any  input  string  x,  the  algorithm  A' 
simulates  cnk  steps  of  A.  After  simulating  cnk  steps,  algorithm  A!  inspects  the  be¬ 
havior  of  A.  If  A  has  accepted  x,  then  A!  accepts  x  by  outputting  a  1.  If  A  has  not 
accepted  x,  then  A'  rejects  x  by  outputting  a  0.  The  overhead  of  A!  simulating  A 
does  not  increase  the  running  time  by  more  than  a  polynomial  factor,  and  thus  A' 
is  a  polynomial-time  algorithm  that  decides  L.  m 

Note  that  the  proof  of  Theorem  34.2  is  nonconstructive.  For  a  given  language 
L  e  P,  we  may  not  actually  know  a  bound  on  the  running  time  for  the  algorithm  A 
that  accepts  L.  Nevertheless,  we  know  that  such  a  bound  exists,  and  therefore,  that 
an  algorithm  A!  exists  that  can  check  the  bound,  even  though  we  may  not  be  able 
to  find  the  algorithm  A'  easily. 

Exercises 


34.1-1 

Define  the  optimization  problem  LONGEST-PATH-LENGTH  as  the  relation  that 
associates  each  instance  of  an  undirected  graph  and  two  vertices  with  the  num¬ 
ber  of  edges  in  a  longest  simple  path  between  the  two  vertices.  Define  the  de¬ 
cision  problem  LONGEST-PATH  =  {(G,  u,  v,  k)  :  G  =  (V,  E)  is  an  undi¬ 
rected  graph,  u ,  v  e  V,  k  >  Oisan  integer,  and  there  exists  a  simple  path 
from  u  to  v  in  G  consisting  of  at  least  k  edges}.  Show  that  the  optimization  prob¬ 
lem  LONGEST-PATH-LENGTH  can  be  solved  in  polynomial  time  if  and  only  if 
LONGEST-PATH  €  P. 


34.1-2 

Give  a  formal  definition  for  the  problem  of  finding  the  longest  simple  cycle  in  an 
undirected  graph.  Give  a  related  decision  problem.  Give  the  language  correspond¬ 
ing  to  the  decision  problem. 


34.1- 3 

Give  a  formal  encoding  of  directed  graphs  as  binary  strings  using  an  adjacency- 
matrix  representation.  Do  the  same  using  an  adjacency-list  representation.  Argue 
that  the  two  representations  are  polynomially  related. 

34.1- 4 

Is  the  dynamic -programming  algorithm  for  the  0- 1  knapsack  problem  that  is  asked 
for  in  Exercise  16.2-2  a  polynomial-time  algorithm?  Explain  your  answer. 
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34.1-5 

Show  that  if  an  algorithm  makes  at  most  a  constant  number  of  calls  to  polynomial¬ 
time  subroutines  and  performs  an  additional  amount  of  work  that  also  takes  polyno¬ 
mial  time,  then  it  runs  in  polynomial  time.  Also  show  that  a  polynomial  number  of 
calls  to  polynomial-time  subroutines  may  result  in  an  exponential-time  algorithm. 


34.1-6 

Show  that  the  class  P,  viewed  as  a  set  of  languages,  is  closed  under  union,  inter¬ 
section,  concatenation,  complement,  and  Kleene  star.  That  is,  if  L\,  L2  e  P,  then 
Li  U  L2  e  P,  Lx  n  L2  e  P,  LXL2  e  P,  Lie  P,  and  L*  e  P. 


34.2  Polynomial-time  verification 

We  now  look  at  algorithms  that  verify  membership  in  languages.  For  example, 
suppose  that  for  a  given  instance  ( G.u.v.k )  of  the  decision  problem  PATH,  we 
are  also  given  a  path  p  from  u  to  v.  We  can  easily  check  whether  p  is  a  path  in  G 
and  whether  the  length  of  p  is  at  most  k,  and  if  so,  we  can  view  p  as  a  “certificate” 
that  the  instance  indeed  belongs  to  PATH.  For  the  decision  problem  PATH,  this 
certificate  doesn’t  seem  to  buy  us  much.  After  all,  PATH  belongs  to  P— in  fact, 
we  can  solve  PATH  in  linear  time— and  so  verifying  membership  from  a  given 
certificate  takes  as  long  as  solving  the  problem  from  scratch.  We  shall  now  examine 
a  problem  for  which  we  know  of  no  polynomial-time  decision  algorithm  and  yet, 
given  a  certificate,  verification  is  easy. 

Hamiltonian  cycles 

The  problem  of  finding  a  hamiltonian  cycle  in  an  undirected  graph  has  been  stud¬ 
ied  for  over  a  hundred  years.  Formally,  a  hamiltonian  cycle  of  an  undirected  graph 
G  =  (V,  E)  is  a  simple  cycle  that  contains  each  vertex  in  V.  A  graph  that  con¬ 
tains  a  hamiltonian  cycle  is  said  to  be  hamiltonian ;  otherwise,  it  is  nonhamilto- 
nian.  The  name  honors  W.  R.  Hamilton,  who  described  a  mathematical  game  on 
the  dodecahedron  (Figure  34.2(a))  in  which  one  player  sticks  five  pins  in  any  five 
consecutive  vertices  and  the  other  player  must  complete  the  path  to  form  a  cycle 
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(a) 


(b) 


Figure  34.2  (a)  A  graph  representing  the  vertices,  edges,  and  faces  of  a  dodecahedron,  with  a 
hamiltonian  cycle  shown  by  shaded  edges,  (b)  A  bipartite  graph  with  an  odd  number  of  vertices. 
Any  such  graph  is  nonhamiltonian. 

containing  all  the  vertices.7  The  dodecahedron  is  hamiltonian,  and  Figure  34.2(a) 
shows  one  hamiltonian  cycle.  Not  all  graphs  are  hamiltonian,  however.  For  ex¬ 
ample,  Figure  34.2(b)  shows  a  bipartite  graph  with  an  odd  number  of  vertices. 
Exercise  34.2-2  asks  you  to  show  that  all  such  graphs  are  nonhamiltonian. 

We  can  define  the  hamiltonian-cycle  problem ,  “Does  a  graph  G  have  a  hamil¬ 
tonian  cycle?”  as  a  formal  language: 

HAM-CYCLE  =  {(G)  :  G  is  a  hamiltonian  graph}  . 

How  might  an  algorithm  decide  the  language  HAM-CYCLE?  Given  a  problem 
instance  (G),  one  possible  decision  algorithm  lists  all  permutations  of  the  vertices 
of  G  and  then  checks  each  permutation  to  see  if  it  is  a  hamiltonian  path.  What  is 
the  running  time  of  this  algorithm?  If  we  use  the  “reasonable”  encoding  of  a  graph 
as  its  adjacency  matrix,  the  number  m  of  vertices  in  the  graph  is  Q(y/n),  where 
n  =  |(G)  |  is  the  length  of  the  encoding  of  G.  There  are  m\  possible  permutations 


7In  a  letter  dated  17  October  1856  to  his  friend  John  T.  Graves,  Hamilton  [157,  p.  624]  wrote,  “I 
have  found  that  some  young  persons  have  been  much  amused  by  trying  a  new  mathematical  game 
which  the  Icosion  furnishes,  one  person  sticking  five  pins  in  any  five  consecutive  points  . . .  and  the 
other  player  then  aiming  to  insert,  which  by  the  theory  in  this  letter  can  always  be  done,  fifteen  other 
pins,  in  cyclical  succession,  so  as  to  cover  all  the  other  points,  and  to  end  in  immediate  proximity  to 
the  pin  wherewith  his  antagonist  had  begun.” 
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of  the  vertices,  and  therefore  the  running  time  is  Q.{m\)  =  £l(y/n !)  = 
which  is  not  0(nk)  for  any  constant  k.  Thus,  this  naive  algorithm  does  not  run 
in  polynomial  time.  In  fact,  the  hamiltonian-cycle  problem  is  NP-complete,  as  we 
shall  prove  in  Section  34.5. 

Verification  algorithms 

Consider  a  slightly  easier  problem.  Suppose  that  a  friend  tells  you  that  a  given 
graph  G  is  hamiltonian,  and  then  offers  to  prove  it  by  giving  you  the  vertices  in 
order  along  the  hamiltonian  cycle.  It  would  certainly  be  easy  enough  to  verify  the 
proof:  simply  verify  that  the  provided  cycle  is  hamiltonian  by  checking  whether 
it  is  a  permutation  of  the  vertices  of  V  and  whether  each  of  the  consecutive  edges 
along  the  cycle  actually  exists  in  the  graph.  You  could  certainly  implement  this 
verification  algorithm  to  run  in  0(n2)  time,  where  n  is  the  length  of  the  encoding 
of  G.  Thus,  a  proof  that  a  hamiltonian  cycle  exists  in  a  graph  can  be  verified  in 
polynomial  time. 

We  define  a  verification  algorithm  as  being  a  two-argument  algorithm  A,  where 
one  argument  is  an  ordinary  input  string  x  and  the  other  is  a  binary  string  y  called 
a  certificate.  A  two-argument  algorithm  A  verifies  an  input  string  x  if  there  exists 
a  certificate  y  such  that  A  (x ,  y )  =  1.  The  language  verified  by  a  verification 
algorithm  A  is 

L  =  {x  e  {0, 1}*  :  there  exists  y  e  {0, 1}*  such  that  A(x,  y)  =  1}  . 

Intuitively,  an  algorithm  A  verifies  a  language  L  if  for  any  string  x  €  L,  there 
exists  a  certificate  y  that  A  can  use  to  prove  that  x  €  L.  Moreover,  for  any  string 
x  $  L,  there  must  be  no  certificate  proving  that  x  €  L.  For  example,  in  the 
hamiltonian-cycle  problem,  the  certificate  is  the  list  of  vertices  in  some  hamilto¬ 
nian  cycle.  If  a  graph  is  hamiltonian,  the  hamiltonian  cycle  itself  offers  enough 
information  to  verify  this  fact.  Conversely,  if  a  graph  is  not  hamiltonian,  there 
can  be  no  list  of  vertices  that  fools  the  verification  algorithm  into  believing  that  the 
graph  is  hamiltonian,  since  the  verification  algorithm  carefully  checks  the  proposed 
“cycle”  to  be  sure. 
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The  complexity  class  NP 

The  complexity  class  NP  is  the  class  of  languages  that  can  be  verified  by  a  poly¬ 
nomial-time  algorithm.8  More  precisely,  a  language  L  belongs  to  NP  if  and  only  if 
there  exist  a  two-input  polynomial-time  algorithm  A  and  a  constant  c  such  that 

L  =  {x  €  {0,  1}*  :  there  exists  a  certificate  y  with  |y|  =  0( \x\c) 
such  that  A(x,  y)  =  1}  . 

We  say  that  algorithm  A  verifies  language  L  in  polynomial  time. 

From  our  earlier  discussion  on  the  hamiltonian-cycle  problem,  we  now  see  that 
HAM-CYCLE  e  NP.  (It  is  always  nice  to  know  that  an  important  set  is  nonempty.) 
Moreover,  if  L  e  P,  then  L  e  NP,  since  if  there  is  a  polynomial-time  algorithm 
to  decide  L,  the  algorithm  can  be  easily  converted  to  a  two-argument  verification 
algorithm  that  simply  ignores  any  certificate  and  accepts  exactly  those  input  strings 
it  determines  to  be  in  L.  Thus,  P  C  NP. 

It  is  unknown  whether  P  =  NP,  but  most  researchers  believe  that  P  and  NP  are 
not  the  same  class.  Intuitively,  the  class  P  consists  of  problems  that  can  be  solved 
quickly.  The  class  NP  consists  of  problems  for  which  a  solution  can  be  verified 
quickly.  You  may  have  learned  from  experience  that  it  is  often  more  difficult  to 
solve  a  problem  from  scratch  than  to  verify  a  clearly  presented  solution,  especially 
when  working  under  time  constraints.  Theoretical  computer  scientists  generally 
believe  that  this  analogy  extends  to  the  classes  P  and  NP,  and  thus  that  NP  includes 
languages  that  are  not  in  P. 

There  is  more  compelling,  though  not  conclusive,  evidence  that  P  NP— the 
existence  of  languages  that  are  “NP-complete.”  We  shall  study  this  class  in  Sec¬ 
tion  34.3. 

Many  other  fundamental  questions  beyond  the  P  NP  question  remain  unre¬ 
solved.  Figure  34.3  shows  some  possible  scenarios.  Despite  much  work  by  many 
researchers,  no  one  even  knows  whether  the  class  NP  is  closed  under  comple¬ 
ment.  That  is,  does  L  e  NP  imply  L  e  NP?  We  can  define  the  complexity  class 
co-NP  as  the  set  of  languages  L  such  that  L  e  NP.  We  can  restate  the  question 
of  whether  NP  is  closed  under  complement  as  whether  NP  =  co-NP.  Since  P  is 
closed  under  complement  (Exercise  34.1-6),  it  follows  from  Exercise  34.2-9  that 
P  c  NP  n  co-NP.  Once  again,  however,  no  one  knows  whether  P  =  NP  n  co-NP 
or  whether  there  is  some  language  in  NP  IT  co-NP  —  P. 


8The  name  “NP”  stands  for  “nondeterministic  polynomial  time.”  The  class  NP  was  originally  studied 
in  the  context  of  nondeterminism,  but  this  book  uses  the  somewhat  simpler  yet  equivalent  notion  of 
verification.  Hopcroft  and  Ullman  [180]  give  a  good  presentation  of  NP  completeness  in  terms  of 
nondeterministic  models  of  computation. 
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(a) 


(b) 


(c) 


(d) 


Figure  34 3  Four  possibilities  for  relationships  among  complexity  classes.  In  each  diagram,  one 
region  enclosing  another  indicates  a  proper  subset  relation,  (a)  P  =  NP  =  co  NP.  Most  researchers 
regard  this  possibility  as  the  most  unlikely,  (b)  If  NP  is  closed  under  complement,  then  NP  =  co  NP, 
but  it  need  not  be  the  case  that  P  =  NP.  (c)  P  =  NP  flco  NP,  but  NP  is  not  closed  under  complement, 
(d)  NP  co  NP  and  P  /  NP  n  co  NP.  Most  researchers  regard  this  possibility  as  the  most  likely. 


Thus,  our  understanding  of  the  precise  relationship  between  P  and  NP  is  woe¬ 
fully  incomplete.  Nevertheless,  even  though  we  might  not  be  able  to  prove  that  a 
particular  problem  is  intractable,  if  we  can  prove  that  it  is  NP-complete,  then  we 
have  gained  valuable  information  about  it. 

Exercises 

34.2- 1 

Consider  the  language  GRAPH-ISOMORPHISM  =  {(Gj,  G2)  :  Gx  and  G2  are 
isomorphic  graphs}.  Prove  that  GRAPH-ISOMORPHISM  €  NP  by  describing  a 
polynomial-time  algorithm  to  verify  the  language. 

34.2- 2 

Prove  that  if  G  is  an  undirected  bipartite  graph  with  an  odd  number  of  vertices, 
then  G  is  nonhamiltonian. 

342-3 

Show  that  if  HAM-CYCLE  €  P,  then  the  problem  of  listing  the  vertices  of  a 
hamiltonian  cycle,  in  order,  is  polynomial-time  solvable. 
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34.2-4 

Prove  that  the  class  NP  of  languages  is  closed  under  union,  intersection,  concate¬ 
nation,  and  Kleene  star.  Discuss  the  closure  of  NP  under  complement. 


34.2- 5 

Show  that  any  language  in  NP  can  be  decided  by  an  algorithm  running  in 
time  for  some  constant  k. 

34.2- 6 

A  hamiltonian  path  in  a  graph  is  a  simple  path  that  visits  every  vertex  exactly 
once.  Show  that  the  language  HAM-PATH  =  {(G.u,v)  :  there  is  a  hamiltonian 
path  from  u  to  v  in  graph  G}  belongs  to  NP. 


34.2-7 

Show  that  the  hamiltonian-path  problem  from  Exercise  34.2-6  can  be  solved  in 
polynomial  time  on  directed  acyclic  graphs.  Give  an  efficient  algorithm  for  the 
problem. 


34.2-8 

Let  </>  be  a  boolean  formula  constructed  from  the  boolean  input  variables  .  x2. 
. . . ,  x/c,  negations  (-■),  ANDs  (A),  ORs  (v),  and  parentheses.  The  formula  0  is  a 
tautology  if  it  evaluates  to  1  for  every  assignment  of  1  and  0  to  the  input  variables. 
Define  TAUTOLOGY  as  the  language  of  boolean  formulas  that  are  tautologies. 
Show  that  TAUTOLOGY  e  co-NP. 


34.2- 9 

Prove  that  P  C  co-NP. 

34.2- 10 

Prove  that  if  NP  ^  co-NP,  then  P  ^  NP. 

34.2- 11 

Let  G  be  a  connected,  undirected  graph  with  at  least  3  vertices,  and  let  G3  be  the 
graph  obtained  by  connecting  all  pairs  of  vertices  that  are  connected  by  a  path  in  G 
of  length  at  most  3.  Prove  that  G3  is  hamiltonian.  {Hint:  Construct  a  spanning  tree 
for  G,  and  use  an  inductive  argument.) 
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34.3  NP-completeness  and  reducibility 

Perhaps  the  most  compelling  reason  why  theoretical  computer  scientists  believe 
that  P  f  NP  comes  from  the  existence  of  the  class  of  “NP-complete”  problems. 
This  class  has  the  intriguing  property  that  if  any  NP-complete  problem  can  be 
solved  in  polynomial  time,  then  every  problem  in  NP  has  a  polynomial-time  solu¬ 
tion,  that  is,  P  =  NP.  Despite  years  of  study,  though,  no  polynomial-time  algorithm 
has  ever  been  discovered  for  any  NP-complete  problem. 

The  language  HAM-CYCLE  is  one  NP-complete  problem.  If  we  could  decide 
HAM-CYCLE  in  polynomial  time,  then  we  could  solve  every  problem  in  NP  in 
polynomial  time.  In  fact,  if  NP  —  P  should  turn  out  to  be  nonempty,  we  could  say 
with  certainty  that  HAM -CYCLE  e  NP  —  P. 

The  NP-complete  languages  are,  in  a  sense,  the  “hardest”  languages  in  NP.  In 
this  section,  we  shall  show  how  to  compare  the  relative  “hardness”  of  languages 
using  a  precise  notion  called  “polynomial-time  reducibility.”  Then  we  formally 
define  the  NP-complete  languages,  and  we  finish  by  sketching  a  proof  that  one 
such  language,  called  CIRCUIT-SAT,  is  NP-complete.  In  Sections  34.4  and  34.5, 
we  shall  use  the  notion  of  reducibility  to  show  that  many  other  problems  are  NP- 
complete. 

Reducibility 

Intuitively,  a  problem  Q  can  be  reduced  to  another  problem  O'  if  any  instance  of  Q 
can  be  “easily  rephrased”  as  an  instance  of  Q' ,  the  solution  to  which  provides  a 
solution  to  the  instance  of  Q.  For  example,  the  problem  of  solving  1  i near  equations 
in  an  indeterminate  x  reduces  to  the  problem  of  solving  quadratic  equations.  Given 
an  instance  ax  +  b  =  0,  we  transform  it  to  Ox2  +  ax  +  b  =  0,  whose  solution 
provides  a  solution  to  ax  +  b  =  0.  Thus,  if  a  problem  Q  reduces  to  another 
problem  Q',  then  Q  is,  in  a  sense,  “no  harder  to  solve”  than  Q'. 

Returning  to  our  formal-language  framework  for  decision  problems,  we  say  that 
a  language  Lx  is  polynomial-time  reducible  to  a  language  L2,  written  L\  <P  L2, 
if  there  exists  a  polynomial-time  computable  function  /  :  {0,  1}*  — >•  {0, 1}*  such 
that  for  all  x  e  {0, 1}*, 

x  €  Lx  if  and  only  if  f(x )  €  L2  .  (34.1) 

We  call  the  function  /  the  reduction  function,  and  a  polynomial-time  algorithm  F 
that  computes  /  is  a  reduction  algorithm. 

Figure  34.4  illustrates  the  idea  of  a  polynomial-time  reduction  from  a  lan¬ 
guage  Li  to  another  language  L2.  Each  language  is  a  subset  of  {0, 1}*.  The 
reduction  function  /  provides  a  polynomial-time  mapping  such  that  if  x  €  L  t , 
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Figure  34.4  An  illustration  of  a  polynomial  time  reduction  from  a  language  L\  to  a  language  L2 
via  a  reduction  function  /.  For  any  input  x  6  {0, 1}*,  the  question  of  whether  x  e  L\  has  the  same 
answer  as  the  question  of  whether  /(*)  e  L 2. 

then  f(x)  €  L2.  Moreover,  if  x  Lx,  then  f(x)  L2.  Thus,  the  reduction  func¬ 
tion  maps  any  instance  x  of  the  decision  problem  represented  by  the  language  L  \ 
to  an  instance  f(x)  of  the  problem  represented  by  L2.  Providing  an  answer  to 
whether  f(x)  e  L2  directly  provides  the  answer  to  whether  x  e  L\. 

Polynomial-time  reductions  give  us  a  powerful  tool  for  proving  that  various  lan¬ 
guages  belong  to  P. 

Lemma  34.3 

If  Lx,L2  C  {0,1}*  are  languages  such  that  Lx  <P  L2,  then  L2  e  P  implies 
Lx  €  P. 

Proof  Let  A2  be  a  polynomial-time  algorithm  that  decides  L2,  and  let  F  be  a 
polynomial-time  reduction  algorithm  that  computes  the  reduction  function  /.  We 
shall  construct  a  polynomial-time  algorithm  A  \  that  decides  L  i . 

Figure  34.5  illustrates  how  we  construct  A\.  For  a  given  input  x  €  {0, 1}*, 
algorithm  A\  uses  F  to  transform  x  into  /(; t),  and  then  it  uses  A2  to  test  whether 
f(x)  e  L2.  Algorithm  At  takes  the  output  from  algorithm  A2  and  produces  that 
answer  as  its  own  output. 

The  correctness  of  A ,  follows  from  condition  (34. 1).  The  algorithm  runs  in  poly¬ 
nomial  time,  since  both  F  and  A2  run  in  polynomial  time  (see  Exercise  34.1-5).  ■ 

NP-conipleteness 

Polynomial-time  reductions  provide  a  formal  means  for  showing  that  one  prob¬ 
lem  is  at  least  as  hard  as  another,  to  within  a  polynomial-time  factor.  That  is,  if 
L i  <P  L2,  then  L\  is  not  more  than  a  polynomial  factor  harder  than  L2,  which  is 
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Figure  34.5  The  proof  of  Lemma  34.3.  The  algorithm  F  is  a  reduction  algorithm  that  computes  the 
reduction  function  /  from  L  \  to  L2  in  polynomial  time,  and  A 2  is  a  polynomial  time  algorithm  that 
decides  L2.  Algorithm  A 1  decides  whether  x  €  L 1  by  using  F  to  transform  any  input  x  into  f(x) 
and  then  using  A2  to  decide  whether  f(x)  €  L2. 

why  the  “less  than  or  equal  to”  notation  for  reduction  is  mnemonic.  We  can  now 
define  the  set  of  NP-complete  languages,  which  are  the  hardest  problems  in  NP. 

A  language  L  C  {0,  1}*  is  NP-complete  if 

1.  L  €  NP,  and 

2.  L'  <P  L  for  every  L'  €  NP. 

If  a  language  L  satisfies  property  2,  but  not  necessarily  property  1 ,  we  say  that  L 
is  NP-hard.  We  also  define  NPC  to  be  the  class  of  NP-complete  languages. 

As  the  following  theorem  shows,  NP-completeness  is  at  the  crux  of  deciding 
whether  P  is  in  fact  equal  to  NP. 

Theorem  34.4 

If  any  NP-complete  problem  is  polynomial-time  solvable,  then  P  =  NP.  Equiva¬ 
lently,  if  any  problem  in  NP  is  not  polynomial-time  solvable,  then  no  NP-complete 
problem  is  polynomial-time  solvable. 

Proof  Suppose  that  L  e  P  and  also  that  L  €  NPC.  For  any  L'  €  NP,  we 
have  L'  <P  L  by  property  2  of  the  definition  of  NP-completeness.  Thus,  by 
Lemma  34.3,  we  also  have  that  L'  6  P,  which  proves  the  first  statement  of  the 
theorem. 

To  prove  the  second  statement,  note  that  it  is  the  contrapositive  of  the  first  state¬ 
ment.  ■ 

It  is  for  this  reason  that  research  into  the  P  7^  NP  question  centers  around  the 
NP-complete  problems.  Most  theoretical  computer  scientists  believe  that  P  7^  NP, 
which  leads  to  the  relationships  among  P,  NP,  and  NPC  shown  in  Figure  34.6. 
But,  for  all  we  know,  someone  may  yet  come  up  with  a  polynomial-time  algo¬ 
rithm  for  an  NP-complete  problem,  thus  proving  that  P  =  NP.  Nevertheless,  since 
no  polynomial-time  algorithm  for  any  NP-complete  problem  has  yet  been  discov- 
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Figure  34.6  How  most  theoretical  computer  scientists  view  the  relationships  among  P,  NP, 
and  NPC.  Both  P  and  NPC  are  wholly  contained  within  NP,  and  P  n  NPC  =  0. 

ered,  a  proof  that  a  problem  is  NP-complete  provides  excellent  evidence  that  it  is 
intractable. 

Circuit  satisfiability 

We  have  defined  the  notion  of  an  NP-complete  problem,  but  up  to  this  point,  we 
have  not  actually  proved  that  any  problem  is  NP-complete.  Once  we  prove  that  at 
least  one  problem  is  NP-complete,  we  can  use  polynomial-time  reducibility  as  a 
tool  to  prove  other  problems  to  be  NP-complete.  Thus,  we  now  focus  on  demon¬ 
strating  the  existence  of  an  NP-complete  problem:  the  circuit- satisfiability  prob¬ 
lem. 

Unfortunately,  the  formal  proof  that  the  circuit-satisfiability  problem  is  NP- 
complete  requires  technical  detail  beyond  the  scope  of  this  text.  Instead,  we  shall 
informally  describe  a  proof  that  relies  on  a  basic  understanding  of  boolean  combi¬ 
national  circuits. 

Boolean  combinational  circuits  are  built  from  boolean  combinational  elements 
that  are  interconnected  by  wires.  A  boolean  combinational  element  is  any  circuit 
element  that  has  a  constant  number  of  boolean  inputs  and  outputs  and  that  performs 
a  well-defined  function.  Boolean  values  are  drawn  from  the  set  {0,  1},  where  0 
represents  false  and  1  represents  true. 

The  boolean  combinational  elements  that  we  use  in  the  circuit-satisfiability  prob¬ 
lem  compute  simple  boolean  functions,  and  they  are  known  as  logic  gates.  Fig¬ 
ure  34.7  shows  the  three  basic  logic  gates  that  we  use  in  the  circuit-satisfiability 
problem:  the  NOT  gate  (or  inverter),  the  AND  gate ,  and  the  OR  gate.  The  NOT 
gate  takes  a  single  binary  input  x,  whose  value  is  either  0  or  1,  and  produces  a 
binary  output  z  whose  value  is  opposite  that  of  the  input  value.  Each  of  the  other 
two  gates  takes  two  binary  inputs  x  and  y  and  produces  a  single  binary  output  z. 

We  can  describe  the  operation  of  each  gate,  and  of  any  boolean  combinational 
element,  by  a  truth  table ,  shown  under  each  gate  in  Figure  34.7.  A  truth  table  gives 
the  outputs  of  the  combinational  element  for  each  possible  setting  of  the  inputs.  For 
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Figure  34.7  Three  basic  logic  gates,  with  binary  inputs  and  outputs.  Under  each  gate  is  the  truth 
table  that  describes  the  gate’s  operation,  (a)  The  NOT  gate,  (b)  The  AND  gate,  (c)  The  OR  gate. 

example,  the  truth  table  for  the  OR  gate  tells  us  that  when  the  inputs  are  x  =  0 
and  y  =  1,  the  output  value  is  z  =  1.  We  use  the  symbols  ->  to  denote  the  NOT 
function,  A  to  denote  the  AND  function,  and  V  to  denote  the  OR  function.  Thus, 
for  example,  0  v  1  =  1. 

We  can  generalize  AND  and  OR  gates  to  take  more  than  two  inputs.  An  AND 
gate’s  output  is  1  if  all  of  its  inputs  are  1,  and  its  output  is  0  otherwise.  An  OR  gate’s 
output  is  1  if  any  of  its  inputs  are  1,  and  its  output  is  0  otherwise. 

A  boolean  combinational  circuit  consists  of  one  or  more  boolean  combinational 
elements  interconnected  by  wires.  A  wire  can  connect  the  output  of  one  element 
to  the  input  of  another,  thereby  providing  the  output  value  of  the  first  element  as  an 
input  value  of  the  second.  Figure  34.8  shows  two  similar  boolean  combinational 
circuits,  differing  in  only  one  gate.  Part  (a)  of  the  figure  also  shows  the  values  on 
the  individual  wires,  given  the  input  (xx  =  1 ,  x2  =  1,  x3  =  0).  Although  a  single 
wire  may  have  no  more  than  one  combinational-element  output  connected  to  it,  it 
can  feed  several  element  inputs.  The  number  of  element  inputs  fed  by  a  wire  is 
called  th e  fan-out  of  the  wire.  If  no  element  output  is  connected  to  a  wire,  the  wire 
is  a  circuit  input,  accepting  input  values  from  an  external  source.  If  no  element 
input  is  connected  to  a  wire,  the  wire  is  a  circuit  output,  providing  the  results  of 
the  circuit’s  computation  to  the  outside  world.  (An  internal  wire  can  also  fan  out 
to  a  circuit  output.)  For  the  purpose  of  defining  the  circuit-satisfiability  problem, 
we  limit  the  number  of  circuit  outputs  to  1,  though  in  actual  hardware  design,  a 
boolean  combinational  circuit  may  have  multiple  outputs. 

Boolean  combinational  circuits  contain  no  cycles.  In  other  words,  suppose  we 
create  a  directed  graph  G  =  (V,  E)  with  one  vertex  for  each  combinational  element 
and  with  k  directed  edges  for  each  wire  whose  fan-out  is  k\  the  graph  contains 
a  directed  edge  (u,  v)  if  a  wire  connects  the  output  of  element  u  to  an  input  of 
element  v.  Then  G  must  be  acyclic. 
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Figure  34.8  Two  instances  of  the  circuit  satisfiability  problem,  (a)  The  assignment  (xi  =  1, 
X2  =  1,  *3  =  0)  to  the  inputs  of  this  circuit  causes  the  output  of  the  circuit  to  be  1.  The  circuit 
is  therefore  satisfiable.  (b)  No  assignment  to  the  inputs  of  this  circuit  can  cause  the  output  of  the 
circuit  to  be  1.  The  circuit  is  therefore  unsatisfiable. 

A  truth  assignment  for  a  boolean  combinational  circuit  is  a  set  of  boolean  input 
values.  We  say  that  a  one-output  boolean  combinational  circuit  is  satisfiable  if  it 
has  a  satisfying  assignment :  a  truth  assignment  that  causes  the  output  of  the  circuit 
to  be  1.  For  example,  the  circuit  in  Figure  34.8(a)  has  the  satisfying  assignment 
(xi  =  1,  x2  =  1,jc3  =  0),  and  so  it  is  satisfiable.  As  Exercise  34.3-1  asks  you  to 
show,  no  assignment  of  values  to  jci,  x2,  and  x3  causes  the  circuit  in  Figure  34.8(b) 
to  produce  a  1  output;  it  always  produces  0,  and  so  it  is  unsatisfiable. 

The  circuit-satisfiability  problem  is,  “Given  a  boolean  combinational  circuit 
composed  of  AND,  OR,  and  NOT  gates,  is  it  satisfiable?”  In  order  to  pose  this 
question  formally,  however,  we  must  agree  on  a  standard  encoding  for  circuits. 
The  size  of  a  boolean  combinational  circuit  is  the  number  of  boolean  combina¬ 
tional  elements  plus  the  number  of  wires  in  the  circuit.  We  could  devise  a  graphlike 
encoding  that  maps  any  given  circuit  C  into  a  binary  string  (C)  whose  length  is 
polynomial  in  the  size  of  the  circuit  itself.  As  a  formal  language,  we  can  therefore 
define 

CIRCUIT-SAT  =  {(C)  :  C  is  a  satisfiable  boolean  combinational  circuit}  . 

The  circuit-satisfiability  problem  arises  in  the  area  of  computer-aided  hardware 
optimization.  If  a  subcircuit  always  produces  0,  that  subcircuit  is  unnecessary; 
the  designer  can  replace  it  by  a  simpler  subcircuit  that  omits  all  logic  gates  and 
provides  the  constant  0  value  as  its  output.  You  can  see  why  we  would  like  to  have 
a  polynomial-time  algorithm  for  this  problem. 

Given  a  circuit  C,  we  might  attempt  to  determine  whether  it  is  satisfiable  by 
simply  checking  all  possible  assignments  to  the  inputs.  Unfortunately,  if  the  circuit 
has  k  inputs,  then  we  would  have  to  check  up  to  2k  possible  assignments.  When 
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the  size  of  C  is  polynomial  in  k,  checking  each  one  takes  Q.(2k )  time,  which  is 
superpolynomial  in  the  size  of  the  circuit.9  In  fact,  as  we  have  claimed,  there  is 
strong  evidence  that  no  polynomial-time  algorithm  exists  that  solves  the  circuit- 
satisfiability  problem  because  circuit  satisfiability  is  NP-complete.  We  break  the 
proof  of  this  fact  into  two  parts,  based  on  the  two  parts  of  the  definition  of  NP- 
completeness. 

Lemma  34.5 

The  circuit-satisfiability  problem  belongs  to  the  class  NP. 

Proof  We  shall  provide  a  two-input,  polynomial-time  algorithm  A  that  can  verify 
CIRCUIT-SAT.  One  of  the  inputs  to  A  is  (a  standard  encoding  of)  a  boolean  com¬ 
binational  circuit  C .  The  other  input  is  a  certificate  corresponding  to  an  assignment 
of  boolean  values  to  the  wires  in  C.  (See  Exercise  34.3-4  for  a  smaller  certificate.) 

We  construct  the  algorithm  A  as  follows.  For  each  logic  gate  in  the  circuit,  it 
checks  that  the  value  provided  by  the  certificate  on  the  output  wire  is  correctly 
computed  as  a  function  of  the  values  on  the  input  wires.  Then,  if  the  output  of  the 
entire  circuit  is  1 ,  the  algorithm  outputs  1 ,  since  the  values  assigned  to  the  inputs 
of  C  provide  a  satisfying  assignment.  Otherwise,  A  outputs  0. 

Whenever  a  satisfiable  circuit  C  is  input  to  algorithm  A,  there  exists  a  certificate 
whose  length  is  polynomial  in  the  size  of  C  and  that  causes  A  to  output  a  1 .  When¬ 
ever  an  unsatisfiable  circuit  is  input,  no  certificate  can  fool  A  into  believing  that 
the  circuit  is  satisfiable.  Algorithm  A  runs  in  polynomial  time:  with  a  good  imple¬ 
mentation,  linear  time  suffices.  Thus,  we  can  verify  CIRCUIT-SAT  in  polynomial 
time,  and  CIRCUIT-SAT  €  NP.  ■ 

The  second  part  of  proving  that  CIRCUIT-SAT  is  NP-complete  is  to  show  that 
the  language  is  NP-hard.  That  is,  we  must  show  that  every  language  in  NP  is 
polynomial-time  reducible  to  CIRCUIT-SAT.  The  actual  proof  of  this  fact  is  full 
of  technical  intricacies,  and  so  we  shall  settle  for  a  sketch  of  the  proof  based  on 
some  understanding  of  the  workings  of  computer  hardware. 

A  computer  program  is  stored  in  the  computer  memory  as  a  sequence  of  in¬ 
structions.  A  typical  instruction  encodes  an  operation  to  be  performed,  addresses 
of  operands  in  memory,  and  an  address  where  the  result  is  to  be  stored.  A  spe¬ 
cial  memory  location,  called  the  program  counter,  keeps  track  of  which  instruc- 


9On  the  other  hand,  if  the  size  of  the  circuit  C  is  0(2*),  then  an  algorithm  whose  running  time 
is  0(2*)  has  a  running  time  that  is  polynomial  in  the  circuit  size.  Even  if  P  f  NP,  this  situa 
tion  would  not  contradict  the  NP  completeness  of  the  problem;  the  existence  of  a  polynomial  time 
algorithm  for  a  special  case  does  not  imply  that  there  is  a  polynomial  time  algorithm  for  all  cases. 
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tion  is  to  be  executed  next.  The  program  counter  automatically  increments  upon 
fetching  each  instruction,  thereby  causing  the  computer  to  execute  instructions  se¬ 
quentially.  The  execution  of  an  instruction  can  cause  a  value  to  be  written  to  the 
program  counter,  however,  which  alters  the  normal  sequential  execution  and  allows 
the  computer  to  loop  and  perform  conditional  branches. 

At  any  point  during  the  execution  of  a  program,  the  computer’s  memory  holds 
the  entire  state  of  the  computation.  (We  take  the  memory  to  include  the  program 
itself,  the  program  counter,  working  storage,  and  any  of  the  various  bits  of  state 
that  a  computer  maintains  for  bookkeeping.)  We  call  any  particular"  state  of  com¬ 
puter  memory  a  configuration.  We  can  view  the  execution  of  an  instruction  as 
mapping  one  configuration  to  another.  The  computer  hardware  that  accomplishes 
this  mapping  can  be  implemented  as  a  boolean  combinational  circuit,  which  we 
denote  by  M  in  the  proof  of  the  following  lemma. 

Lemma  34.6 

The  circuit-satisfiability  problem  is  NP-hard. 

Proof  Let  L  be  any  language  in  NP.  We  shall  describe  a  polynomial-time  algo¬ 
rithm  F  computing  a  reduction  function  /  that  maps  every  binary  string  x  to  a 
circuit  C  =  f  ( x )  such  that  x  e  L  if  and  only  if  C  e  CIRCUIT-SAT. 

Since  L  e  NP,  there  must  exist  an  algorithm  A  that  verifies  L  in  polynomial 
time.  The  algorithm  F  that  we  shall  construct  uses  the  two-input  algorithm  A  to 
compute  the  reduction  function  /. 

Let  T (n)  denote  the  worst-case  running  time  of  algorithm  A  on  length-n  input 
strings,  and  let  k  >  1  be  a  constant  such  that  T(n)  =  0(nk )  and  the  length  of  the 
certificate  is  0(nk).  (The  running  time  of  A  is  actually  a  polynomial  in  the  total 
input  size,  which  includes  both  an  input  string  and  a  certificate,  but  since  the  length 
of  the  certificate  is  polynomial  in  the  length  n  of  the  input  string,  the  running  time 
is  polynomial  in  n.) 

The  basic  idea  of  the  proof  is  to  represent  the  computation  of  A  as  a  sequence 
of  configurations.  As  Figure  34.9  illustrates,  we  can  break  each  configuration  into 
parts  consisting  of  the  program  for  A,  the  program  counter  and  auxiliary  machine 
state,  the  input  x,  the  certificate  y,  and  working  storage.  The  combinational  cir¬ 
cuit  M,  which  implements  the  computer  hardware,  maps  each  configuration  c,  to 
the  next  configuration  c,-+ 1,  starting  from  the  initial  configuration  c0.  Algorithm  A 
writes  its  output— 0  or  1— to  some  designated  location  by  the  time  it  finishes  ex¬ 
ecuting,  and  if  we  assume  that  thereafter  A  halts,  the  value  never  changes.  Thus, 
if  the  algorithm  runs  for  at  most  T(n)  steps,  the  output  appears  as  one  of  the  bits 

in  cT(n)- 

The  reduction  algorithm  F  constructs  a  single  combinational  circuit  that  com¬ 
putes  all  configurations  produced  by  a  given  initial  configuration.  The  idea  is  to 
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Figure  34.9  The  sequence  of  configurations  produced  by  an  algorithm  A  running  on  an  input  x  and 
certificate  y.  Each  configuration  represents  the  state  of  the  computer  for  one  step  of  the  computation 
and,  besides  A,  x,  and  y,  includes  the  program  counter  (PC),  auxiliary  machine  state,  and  working 
storage.  Except  for  the  certificate  y ,  the  initial  configuration  cq  is  constant.  A  boolean  combinational 
circuit  M  maps  each  configuration  to  the  next  configuration.  The  output  is  a  distinguished  bit  in  the 
working  storage. 
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paste  together  T  (n )  copies  of  the  circuit  M .  The  output  of  the  ith  circuit,  which 
produces  configuration  c,  ,  feeds  directly  into  the  input  of  the  (i  +  l)st  circuit.  Thus, 
the  configurations,  rather  than  being  stored  in  the  computer’s  memory,  simply  re¬ 
side  as  values  on  the  wires  connecting  copies  of  M . 

Recall  what  the  polynomial-time  reduction  algorithm  F  must  do.  Given  an  in¬ 
put  x,  it  must  compute  a  circuit  C  =  /(x)  that  is  satisfiable  if  and  only  if  there 
exists  a  certificate  y  such  that  A(x,y)  =  1.  When  F  obtains  an  input  x,  it  first 
computes  n  =  \x\  and  constructs  a  combinational  circuit  C'  consisting  of  T (n) 
copies  of  M .  The  input  to  C  is  an  initial  configuration  corresponding  to  a  compu¬ 
tation  on  A(x,v),  and  the  output  is  the  configuration  Crop- 

Algorithm  F  modifies  circuit  C  slightly  to  construct  the  circuit  C  =  /(x). 
First,  it  wires  the  inputs  to  C'  corresponding  to  the  program  for  A,  the  initial  pro¬ 
gram  counter,  the  input  x,  and  the  initial  state  of  memory  directly  to  these  known 
values.  Thus,  the  only  remaining  inputs  to  the  circuit  correspond  to  the  certifi¬ 
cate  y.  Second,  it  ignores  all  outputs  from  C' ,  except  for  the  one  bit  of  Cr(n) 
corresponding  to  the  output  of  A.  This  circuit  C,  so  constructed,  computes 
C(y)  =  A(x ,  y)  for  any  input  y  of  length  0{nk).  The  reduction  algorithm  F, 
when  provided  an  input  string  x,  computes  such  a  circuit  C  and  outputs  it. 

We  need  to  prove  two  properties.  First,  we  must  show  that  F  correctly  computes 
a  reduction  function  /.  That  is,  we  must  show  that  C  is  satisfiable  if  and  only  if 
there  exists  a  certificate  y  such  that  A(x,  y)  =  1.  Second,  we  must  show  that  F 
runs  in  polynomial  time. 

To  show  that  F  correctly  computes  a  reduction  function,  let  us  suppose  that  there 
exists  a  certificate  y  of  length  0(nk )  such  that  A(x,  y)  =  1.  Then,  if  we  apply  the 
bits  of  y  to  the  inputs  of  C,  the  output  of  C  is  C{y)  =  A(x,  y)  =  1.  Thus,  if  a 
certificate  exists,  then  C  is  satisfiable.  For  the  other  direction,  suppose  that  C  is 
satisfiable.  Hence,  there  exists  an  input  y  to  C  such  that  C (y )  =  1,  from  which 
we  conclude  that  A(x,y)  =  1.  Thus,  F  correctly  computes  a  reduction  function. 

To  complete  the  proof  sketch,  we  need  only  show  that  F  runs  in  time  polynomial 
in  n  =  |x|.  The  first  observation  we  make  is  that  the  number  of  bits  required  to 
represent  a  configuration  is  polynomial  in  n .  The  program  for  A  itself  has  constant 
size,  independent  of  the  length  of  its  input  x.  The  length  of  the  input  x  is  n,  and 
the  length  of  the  certificate  y  is  0(nk).  Since  the  algorithm  runs  for  at  most  0(nk) 
steps,  the  amount  of  working  storage  required  by  A  is  polynomial  in  n  as  well. 
(We  assume  that  this  memory  is  contiguous;  Exercise  34.3-5  asks  you  to  extend 
the  argument  to  the  situation  in  which  the  locations  accessed  by  A  are  scattered 
across  a  much  larger  region  of  memory  and  the  particular  pattern  of  scattering  can 
differ  for  each  input  x.) 

The  combinational  circuit  M  implementing  the  computer  hardware  has  size 
polynomial  in  the  length  of  a  configuration,  which  is  0(nk)\  hence,  the  size  of  M 
is  polynomial  in  n.  (Most  of  this  circuitry  implements  the  logic  of  the  memory 
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system.)  The  circuit  C  consists  of  at  most  t  =  0{nk)  copies  of  M ,  and  hence  it 
has  size  polynomial  in  n.  The  reduction  algorithm  F  can  construct  C  from  x  in 
polynomial  time,  since  each  step  of  the  construction  takes  polynomial  time.  ■ 

The  language  CIRCUIT-SAT  is  therefore  at  least  as  hard  as  any  language  in  NP, 
and  since  it  belongs  to  NP,  it  is  NP-complete. 

Theorem  34.7 

The  circuit-satisfiability  problem  is  NP-complete. 

Proof  Immediate  from  Lemmas  34.5  and  34.6  and  from  the  definition  of  NP- 
completeness.  ■ 

Exercises 


34.3-1 

Verify  that  the  circuit  in  Figure  34.8(b)  is  unsatisfiable. 


34.3-2 

Show  that  the  <P  relation  is  a  transitive  relation  on  languages.  That  is,  show  that  if 
F i  L2  and  L2  5:p  L3,  then  L\  ^P  L3. 


34.3-3 

Prove  that  L  <P  L  if  and  only  if  L  <P  L. 


34.3-4 

Show  that  we  could  have  used  a  satisfying  assignment  as  a  certificate  in  an  alter¬ 
native  proof  of  Lemma  34.5.  Which  certificate  makes  for  an  easier  proof? 


34.3-5 

The  proof  of  Lemma  34.6  assumes  that  the  working  storage  for  algorithm  A  occu¬ 
pies  a  contiguous  region  of  polynomial  size.  Where  in  the  proof  do  we  exploit  this 
assumption?  Argue  that  this  assumption  does  not  involve  any  loss  of  generality. 


34.3-6 

A  language  L  is  complete  for  a  language  class  C  with  respect  to  polynomial-time 
reductions  if  L  e  C  and  L'  <P  L  for  all  L'  e  C .  Show  that  0  and  {0,  1}*  are  the 
only  languages  in  P  that  are  not  complete  for  P  with  respect  to  polynomial-time 
reductions. 
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34.3-7 

Show  that,  with  respect  to  polynomial-time  reductions  (see  Exercise  34.3-6),  L  is 
complete  for  NP  if  and  only  if  L  is  complete  for  co-NP. 


34.3-8 

The  reduction  algorithm  F  in  the  proof  of  Lemma  34.6  constructs  the  circuit 
C  =  fix')  based  on  knowledge  of  x.  A,  and  k.  Professor  Sartre  observes  that 
the  string  x  is  input  to  F,  but  only  the  existence  of  A,  k,  and  the  constant  factor 
implicit  in  the  0{nk)  running  time  is  known  to  F  (since  the  language  L  belongs 
to  NP),  not  their  actual  values.  Thus,  the  professor  concludes  that  F  can’t  possi¬ 
bly  construct  the  circuit  C  and  that  the  language  CIRCUIT-SAT  is  not  necessarily 
NP-hard.  Explain  the  flaw  in  the  professor’s  reasoning. 


34.4  NP-completeness  proofs 

We  proved  that  the  circuit-satisfiability  problem  is  NP-complete  by  a  direct  proof 
that  L  <P  CIRCUIT-SAT  for  every  language  L  e  NP.  In  this  section,  we  shall 
show  how  to  prove  that  languages  are  NP-complete  without  directly  reducing  every 
language  in  NP  to  the  given  language.  We  shall  illustrate  this  methodology  by 
proving  that  various  formula-satisfiability  problems  are  NP-complete.  Section  34.5 
provides  many  more  examples  of  the  methodology. 

The  following  lemma  is  the  basis  of  our  method  for  showing  that  a  language  is 
NP-complete. 

Lemma  34.8 

If  L  is  a  language  such  that  L'  <P  L  for  some  L'  e  NPC,  then  L  is  NP-hard.  If,  in 
addition,  L  €  NP,  then  L  e  NPC. 

Proof  Since  L'  is  NP-complete,  for  all  L"  e  NP,  we  have  L"  <P  L' .  By  sup¬ 
position,  L'  <P  L,  and  thus  by  transitivity  (Exercise  34.3-2),  we  have  L"  <P  L, 
which  shows  that  L  is  NP-hard.  If  L  e  NP,  we  also  have  L  e  NPC.  ■ 

In  other  words,  by  reducing  a  known  NP-complete  language  L'  to  L,  we  implic¬ 
itly  reduce  every  language  in  NP  to  L.  Thus,  Lemma  34.8  gives  us  a  method  for 
proving  that  a  language  L  is  NP-complete: 

1.  Prove  L  e  NP. 

2.  Select  a  known  NP-complete  language  L' . 
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3.  Describe  an  algorithm  that  computes  a  function  /  mapping  every  instance 
x  €  {0, 1}*  of  L'  to  an  instance  / (x)  of  L. 

4.  Prove  that  the  function  /  satisfies  x  €  L'  if  and  only  if  f(x)  e  L  for  all 
x  e  {0,1}*. 

5.  Prove  that  the  algorithm  computing  /  runs  in  polynomial  time. 

(Steps  2-5  show  that  L  is  NP-hard.)  This  methodology  of  reducing  from  a  sin¬ 
gle  known  NP -complete  language  is  far  simpler  than  the  more  complicated  pro¬ 
cess  of  showing  directly  how  to  reduce  from  every  language  in  NP.  Proving 
CIRCUIT-SAT  e  NPC  has  given  us  a  “foot  in  the  door.”  Because  we  know  that  the 
circuit-satisfiability  problem  is  NP -complete,  we  now  can  prove  much  more  easily 
that  other  problems  are  NP-complete.  Moreover,  as  we  develop  a  catalog  of  known 
NP-complete  problems,  we  will  have  more  and  more  choices  for  languages  from 
which  to  reduce. 

Formula  satisfiability 

We  illustrate  the  reduction  methodology  by  giving  an  NP-completeness  proof  for 
the  problem  of  determining  whether  a  boolean  formula,  not  a  circuit,  is  satisfiable. 
This  problem  has  the  historical  honor  of  being  the  first  problem  ever  shown  to  be 
NP-complete. 

We  formulate  the  (formula )  satisfiability  problem  in  terms  of  the  language  SAT 
as  follows.  An  instance  of  SAT  is  a  boolean  formula  0  composed  of 

1 .  n  boolean  variables:  X\ ,  x2, . . . ,  xn ; 

2.  m  boolean  connectives:  any  boolean  function  with  one  or  two  inputs  and  one 
output,  such  as  A  (AND),  V  (OR),  — >  (NOT),  — >•  (implication),  (if  and  only 
il);  and 

3.  parentheses.  (Without  loss  of  generality,  we  assume  that  there  are  no  redundant 
parentheses,  i.e.,  a  formula  contains  at  most  one  pair  of  parentheses  per  boolean 
connective.) 

We  can  easily  encode  a  boolean  formula  0  in  a  length  that  is  polynomial  in  n  +  m. 
As  in  boolean  combinational  circuits,  a  truth  assignment  for  a  boolean  formula  <j) 
is  a  set  of  values  for  the  variables  of  0,  and  a  satisfying  assignment  is  a  truth 
assignment  that  causes  it  to  evaluate  to  1 .  A  formula  with  a  satisfying  assignment 
is  a  satisfiable  formula.  The  satisfiability  problem  asks  whether  a  given  boolean 
formula  is  satisfiable;  in  formal-language  terms, 

SAT  =  {(0)  :  0  is  a  satisfiable  boolean  formula}  . 

As  an  example,  the  formula 
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0  =  ((Xi  ->  X2)  V  — ■((— 'JCi  ^  X3)  V  X4))  A  ~'X2 

has  the  satisfying  assignment  (x\  =  0,x2  =  0,  x3  =  l,x4  =  1),  since 

cj)  =  ((0  — ^  0)  v  — ■(( — *0  -o-  1)  v  1))  A  — '0  (34.2) 

=  (1  v  — 1  ( 1  v  1))  A  1 

=  (1  v  0)  A  1 

=  1, 

and  thus  this  formula  </>  belongs  to  SAT. 

The  naive  algorithm  to  determine  whether  an  arbitrary  boolean  formula  is  satis- 
fiable  does  not  run  in  polynomial  time.  A  formula  with  n  variables  has  2"  possible 
assignments.  If  the  length  of  ( cf> )  is  polynomial  in  n,  then  checking  every  assign¬ 
ment  requires  £2(2")  time,  which  is  supeipolynomial  in  the  length  of  ( cf> ).  As  the 
following  theorem  shows,  a  polynomial-time  algorithm  is  unlikely  to  exist. 

Theorem  34.9 

Satisfiability  of  boolean  formulas  is  NP-complete. 

Proof  We  stall  by  arguing  that  SAT  €  NP.  Then  we  prove  that  SAT  is  NP-hard  by 
showing  that  CIRCUIT-SAT  <P  SAT;  by  Lemma  34.8,  this  will  prove  the  theorem. 

To  show  that  SAT  belongs  to  NP,  we  show  that  a  certificate  consisting  of  a 
satisfying  assignment  for  an  input  formula  f  can  be  verified  in  polynomial  time. 
The  verifying  algorithm  simply  replaces  each  variable  in  the  formula  with  its  cor¬ 
responding  value  and  then  evaluates  the  expression,  much  as  we  did  in  equa¬ 
tion  (34.2)  above.  This  task  is  easy  to  do  in  polynomial  time.  If  the  expression 
evaluates  to  1 ,  then  the  algorithm  has  verified  that  the  formula  is  satisfiable.  Thus, 
the  first  condition  of  Lemma  34.8  for  NP-completeness  holds. 

To  prove  that  SAT  is  NP-hard,  we  show  that  CIRCUIT-SAT  <P  SAT.  In  other 
words,  we  need  to  show  how  to  reduce  any  instance  of  circuit  satisfiability  to  an 
instance  of  formula  satisfiability  in  polynomial  time.  We  can  use  induction  to 
express  any  boolean  combinational  circuit  as  a  boolean  formula.  We  simply  look 
at  the  gate  that  produces  the  circuit  output  and  inductively  express  each  of  the 
gate’s  inputs  as  formulas.  We  then  obtain  the  formula  for  the  circuit  by  writing  an 
expression  that  applies  the  gate’s  function  to  its  inputs’  formulas. 

Unfortunately,  this  straightforward  method  does  not  amount  to  a  polynomial¬ 
time  reduction.  As  Exercise  34.4-1  asks  you  to  show,  shared  subformulas— which 
arise  from  gates  whose  output  wires  have  fan-out  of  2  or  more— can  cause  the 
size  of  the  generated  formula  to  grow  exponentially.  Thus,  the  reduction  algorithm 
must  be  somewhat  more  clever. 

Figure  34.10  illustrates  how  we  overcome  this  problem,  using  as  an  example 
the  circuit  from  Figure  34.8(a).  For  each  wire  x,  in  the  circuit  C,  the  formula  cf> 
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Figure  34.10  Reducing  circuit  satisfiability  to  formula  satisfiability.  The  formula  produced  by  the 
reduction  algorithm  has  a  variable  for  each  wire  in  the  circuit. 

has  a  variable  x, .  We  can  now  express  how  each  gate  operates  as  a  small  formula 
involving  the  variables  of  its  incident  wires.  For  example,  the  operation  of  the 
output  AND  gate  is  x10  «->■  (x7  A  xg  A  xg).  We  call  each  of  these  small  formulas  a 
clause. 

The  formula  <p  produced  by  the  reduction  algorithm  is  the  AND  of  the  circuit- 
output  variable  with  the  conjunction  of  clauses  describing  the  operation  of  each 
gate.  For  the  circuit  in  the  figure,  the  formula  is 


A 

C*4 

-*3) 

A 

(*5 

(*1  V 

*2)) 

A 

(*6 

-.X4) 

A 

C*7 

C*1  A 

*2  A  X4)) 

A 

C*8 

*6)) 

A 

(Xg 

(*6  V 

*?)) 

A 

(*10  **  (X7  AXgA  Xg)) 

Given  a  circuit  C ,  it  is  straightforward  to  produce  such  a  formula  0  in  polynomial 
time. 

Why  is  the  circuit  C  satisfiable  exactly  when  the  formula  <p  is  satisfiable?  If  C 
has  a  satisfying  assignment,  then  each  wire  of  the  circuit  has  a  well-defined  value, 
and  the  output  of  the  circuit  is  1.  Therefore,  when  we  assign  wire  values  to 
variables  in  tp,  each  clause  of  <p  evaluates  to  1,  and  thus  the  conjunction  of  all 
evaluates  to  1.  Conversely,  if  some  assignment  causes  <f>  to  evaluate  to  1,  the 
circuit  C  is  satisfiable  by  an  analogous  argument.  Thus,  we  have  shown  that 
CIRCUIT-SAT  <p  SAT,  which  completes  the  proof.  ■ 
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3-CNF  satisfiability 

We  can  prove  many  problems  NP-complete  by  reducing  from  formula  satisfiability. 
The  reduction  algorithm  must  handle  any  input  formula,  though,  and  this  require¬ 
ment  can  lead  to  a  huge  number  of  cases  that  we  must  consider.  We  often  prefer 
to  reduce  from  a  restricted  language  of  boolean  formulas,  so  that  we  need  to  con¬ 
sider  fewer  cases.  Of  course,  we  must  not  restrict  the  language  so  much  that  it 
becomes  polynomial-time  solvable.  One  convenient  language  is  3-CNF  satisfiabil¬ 
ity,  or  3-CNF-SAT. 

We  define  3-CNF  satisfiability  using  the  following  terns.  A  literal  in  a  boolean 
formula  is  an  occurrence  of  a  variable  or  its  negation.  A  boolean  formula  is  in 
conjunctive  normal  form,  or  CNF,  if  it  is  expressed  as  an  AND  of  clauses,  each 
of  which  is  the  OR  of  one  or  more  literals.  A  boolean  formula  is  in  3-conjunctive 
normal  form,  or  3-CNF,  if  each  clause  has  exactly  three  distinct  literals. 

For  example,  the  boolean  formula 

(xi  V  -1JC1  V  -1X2)  A  (x3  V  X2  v  X4)  A  (-1X1  V  -1X3  V  ->X4) 

is  in  3-CNF.  The  first  of  its  three  clauses  is  (xi  V  ->Xi  V  ->x2),  which  contains  the 
three  literals  Xi,  ->Xi,  and  ->x2. 

In  3-CNF-SAT,  we  are  asked  whether  a  given  boolean  formula  cf>  in  3-CNF  is 
satisfiable.  The  following  theorem  shows  that  a  polynomial-time  algorithm  that 
can  determine  the  satisfiability  of  boolean  formulas  is  unlikely  to  exist,  even  when 
they  are  expressed  in  this  simple  normal  form. 

Theorem  34.10 

Satisfiability  of  boolean  formulas  in  3-conjunctive  normal  form  is  NP-complete. 

Proof  The  argument  we  used  in  the  proof  of  Theorem  34.9  to  show  that  SAT  e 
NP  applies  equally  well  here  to  show  that  3-CNF-SAT  e  NP.  By  Lemma  34.8, 
therefore,  we  need  only  show  that  SAT  <P  3-CNF-SAT. 

We  break  the  reduction  algorithm  into  three  basic  steps.  Each  step  progressively 
transforms  the  input  formula  <p  closer  to  the  desired  3-conjunctive  normal  form. 

The  first  step  is  similar  to  the  one  used  to  prove  CIRCUIT-SAT  <P  SAT  in 
Theorem  34.9.  First,  we  construct  a  binary  “parse”  tree  for  the  input  formula  f, 
with  literals  as  leaves  and  connectives  as  internal  nodes.  Figure  34. 1 1  shows  such 
a  parse  tree  for  the  formula 

4>  =  ((xi  — x2)  v  — ■(( — 'Xi  4*  x3)  v  x4))  A  — >x2  .  (34.3) 

Should  the  input  formula  contain  a  clause  such  as  the  OR  of  several  literals,  we  use 
associativity  to  parenthesize  the  expression  fully  so  that  every  internal  node  in  the 
resulting  tree  has  1  or  2  children.  We  can  now  think  of  the  binary  parse  tree  as  a 
circuit  for  computing  the  function. 
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Figure34.il  The  tree  corresponding  to  the  formula  <p  =  ((jc  i  — JC2 ) v — >(( — 'jr  j  -o-X3)VX4))a-'X2- 


Mimicking  the  reduction  in  the  proof  of  Theorem  34.9,  we  introduce  a  vari¬ 
able  y,-  for  the  output  of  each  internal  node.  Then,  we  rewrite  the  original  for¬ 
mula  tj)  as  the  AND  of  the  root  variable  and  a  conjunction  of  clauses  describing  the 
operation  of  each  node.  For  the  formula  (34.3),  the  resulting  expression  is 


<t>'  =  y t  a  (j!  O2  a  -vt2)) 
a  O2  **  O3  V  J4)) 
a  (j3  **  Ol  ->•  *2)) 

a  O4  **  -ys) 

A  Os  Os  V  X4)) 

A  (j6  **  (— '^1  **  *3)) 


Observe  that  the  formula  $  thus  obtained  is  a  conjunction  of  clauses  <p-,  each  of 
which  has  at  most  3  literals.  The  only  requirement  that  we  might  fail  to  meet  is 
that  each  clause  has  to  be  an  OR  of  3  literals. 

The  second  step  of  the  reduction  converts  each  clause  <f>\  into  conjunctive  normal 
form.  We  construct  a  truth  table  for  0-  by  evaluating  all  possible  assignments  to 
its  variables.  Each  row  of  the  truth  table  consists  of  a  possible  assignment  of  the 
variables  of  the  clause,  together  with  the  value  of  the  clause  under  that  assignment. 
Using  the  truth-table  entries  that  evaluate  to  0,  we  build  a  formula  in  disjunctive 
normal  form  (or  DNF)— an  OR  of  ANDs— that  is  equivalent  to  ->$.  We  then 
negate  this  formula  and  convert  it  into  a  CNF  formula  tp"  by  using  DeM organ ’s 
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Figure  34.12  The  truth  table  for  the  clause  (y\  -o-  (y2  A  —•x2 )). 

laws  for  propositional  logic, 

-'(a  A  b)  =  ^av^b, 

-’( a  v  b)  =  —<a  A  —'b  , 

to  complement  all  literals,  change  ORs  into  ANDs,  and  change  ANDs  into  ORs. 

In  our  example,  we  convert  the  clause  <p[  =  (  it  -o-  (y2  a  -'X2))  into  CNF 
as  follows.  The  truth  table  for  <p\  appears  in  Figure  34.12.  The  DNF  formula 
equivalent  to  — >0',  is 

<J l  A  y2  A  X2)  v  (j!  A  -iy 2  A  X2)  v  (j!  A  -<y2  A  -X2)  V  (-.Jr  A  y2  A  -X2)  . 
Negating  and  applying  DeMorgan’s  laws,  we  get  the  CNF  formula 

01  =  (— ■  Vi  V  ->y2  V  — 'X2)  a  (-.j!  v  _y2  v  -.X2) 

A  (-,y1  v  y2  v  x2)  A  (>>!  v  ^y2  v  x2)  , 

which  is  equivalent  to  the  original  clause  (/)[ . 

At  this  point,  we  have  converted  each  clause  (p-  of  the  formula  (j)1  into  a  CNF 
formula  4>" ,  and  thus  <//  is  equivalent  to  the  CNF  formula  0"  consisting  of  the 
conjunction  of  the  r/>(".  Moreover,  each  clause  of  0"  has  at  most  3  literals. 

The  third  and  final  step  of  the  reduction  further  transforms  the  formula  so  that 
each  clause  has  exactly  3  distinct  literals.  We  construct  the  final  3-CNF  formula  0"' 
from  the  clauses  of  the  CNF  formula  0".  The  formula  0"'  also  uses  two  auxiliary 
variables  that  we  shall  call  p  and  q.  For  each  clause  C,  of  0",  we  include  the 
following  clauses  in  0'": 

•  If  Cj  has  3  distinct  literals,  then  simply  include  C,  as  a  clause  of  0/". 

•  If  C,  has  2  distinct  literals,  that  is,  if  C,  =  (/,  V  /2),  where  /,  and  l2  are  literals, 
then  include  (/i  V  l2  V  p)  A  (/x  v  l2v  ->p)  as  clauses  of  0"'.  The  literals 
p  and  -i  p  merely  fulfill  the  syntactic  requirement  that  each  clause  of  0"'  has 
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exactly  3  distinct  literals.  Whether  p  =  0  or  p  =  1,  one  of  the  clauses  is 
equivalent  to  l\  V I2,  and  the  other  evaluates  to  1,  which  is  the  identity  for  AND. 

•  If  C,  has  just  1  distinct  literal  /,  then  include  (/  V  p  V  q)  A  (/  V  p  V  -> q )  A 
(/  V  —>p  V  q)  A  (/  V  ->p  V  -■ry)  as  clauses  of  0"'.  Regardless  of  the  values  of  p 
and  q,  one  of  the  four  clauses  is  equivalent  to  /,  and  the  other  3  evaluate  to  1. 

We  can  see  that  the  3-CNF  formula  </>"'  is  satisfiable  if  and  only  if  0  is  satisfiable 
by  inspecting  each  of  the  three  steps.  Like  the  reduction  from  CIRCUIT-SAT  to 
SAT,  the  construction  of  0'  from  0  in  the  first  step  preserves  satisfiability.  The 
second  step  produces  a  CNF  formula  0"  that  is  algebraically  equivalent  to  0'.  The 
third  step  produces  a  3-CNF  formula  0"'  that  is  effectively  equivalent  to  0",  since 
any  assignment  to  the  variables  p  and  q  produces  a  formula  that  is  algebraically 
equivalent  to  0". 

We  must  also  show  that  the  reduction  can  be  computed  in  polynomial  time.  Con¬ 
structing  0'  from  0  introduces  at  most  1  variable  and  1  clause  per  connective  in  0. 
Constructing  0"  from  0'  can  introduce  at  most  8  clauses  into  0"  for  each  clause 
from  0',  since  each  clause  of  0'  has  at  most  3  variables,  and  the  truth  table  for 
each  clause  has  at  most  23  =  8  rows.  The  construction  of  0'"  from  0"  introduces 
at  most  4  clauses  into  0'"  for  each  clause  of  0".  Thus,  the  size  of  the  resulting 
formula  0"'  is  polynomial  in  the  length  of  the  original  formula.  Each  of  the  con¬ 
structions  can  easily  be  accomplished  in  polynomial  time.  ■ 

Exercises 


34.4-1 

Consider  the  straightforward  (nonpolynomial-time)  reduction  in  the  proof  of  The¬ 
orem  34.9.  Describe  a  circuit  of  size  n  that,  when  converted  to  a  formula  by  this 
method,  yields  a  formula  whose  size  is  exponential  in  n . 


34.4-2 

Show  the  3-CNF  formula  that  results  when  we  use  the  method  of  Theorem  34.10 
on  the  formula  (34.3). 


34.4-3 

Professor  Jagger  proposes  to  show  that  SAT  <P  3-CNF-SAT  by  using  only  the 
truth-table  technique  in  the  proof  of  Theorem  34.10,  and  not  the  other  steps.  That 
is,  the  professor  proposes  to  take  the  boolean  formula  0,  form  a  truth  table  for 
its  variables,  derive  from  the  truth  table  a  formula  in  3-DNF  that  is  equivalent 
to  -■0,  and  then  negate  and  apply  DeMorgan’s  laws  to  produce  a  3-CNF  formula 
equivalent  to  0.  Show  that  this  strategy  does  not  yield  a  polynomial-time  reduction. 
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34.4-4 

Show  that  the  problem  of  determining  whether  a  boolean  formula  is  a  tautology  is 
complete  for  co-NP.  {Hint:  See  Exercise  34.3-7.) 


34.4-5 

Show  that  the  problem  of  determining  the  satisfiability  of  boolean  formulas  in  dis¬ 
junctive  normal  form  is  polynomial-time  solvable. 


34.4-6 

Suppose  that  someone  gives  you  a  polynomial-time  algorithm  to  decide  formula 
satisfiability.  Describe  how  to  use  this  algorithm  to  find  satisfying  assignments  in 
polynomial  time. 


34.4-7 

Let  2-CNF-SAT  be  the  set  of  satisfiable  boolean  formulas  in  CNF  with  exactly  2 
literals  per  clause.  Show  that  2-CNF-SAT  e  P.  Make  your  algorithm  as  efficient  as 
possible.  {Hint:  Observe  that  x  V  y  is  equivalent  to  —>x  — >■  y.  Reduce  2-CNF-SAT 
to  an  efficiently  solvable  problem  on  a  directed  graph.) 


34.5  NP-complete  problems 

NP-complete  problems  arise  in  diverse  domains:  boolean  logic,  graphs,  arithmetic, 
network  design,  sets  and  partitions,  storage  and  retrieval,  sequencing  and  schedul¬ 
ing,  mathematical  programming,  algebra  and  number  theory,  games  and  puzzles, 
automata  and  language  theory,  program  optimization,  biology,  chemistry,  physics, 
and  more.  In  this  section,  we  shall  use  the  reduction  methodology  to  provide  NP- 
completeness  proofs  for  a  variety  of  problems  drawn  from  graph  theory  and  set 
partitioning. 

Figure  34.13  outlines  the  structure  of  the  NP-completeness  proofs  in  this  section 
and  Section  34.4.  We  prove  each  language  in  the  figure  to  be  NP-complete  by 
reduction  from  the  language  that  points  to  it.  At  the  root  is  CIRCUIT-SAT,  which 
we  proved  NP-complete  in  Theorem  34.7. 

34.5.1  The  clique  problem 

A  clique  in  an  undirected  graph  G  =  (V.  E)  is  a  subset  V  C  V  of  vertices,  each 
pair  of  which  is  connected  by  an  edge  in  E.  In  other  words,  a  clique  is  a  complete 
subgraph  of  G.  The  size  of  a  clique  is  the  number  of  vertices  it  contains.  The 
clique  problem  is  the  optimization  problem  of  finding  a  clique  of  maximum  size  in 


34.5  NP  complete  problems 


1087 


(CIRCUIT  sat) 


V 

(sat) 


(3  CNF  SAT) 


(subset  sum) 


(vertex  cover) 

¥ 

(ham  cycle) 

¥ 

(tsp) 


Figure  34.  13  The  structure  of  NP  completeness  proofs  in  Sections  34.4  and  34.5.  All  proofs  ulti 
mately  follow  by  reduction  from  the  NP  completeness  of  CIRCUIT  SAT. 

a  graph.  As  a  decision  problem,  we  ask  simply  whether  a  clique  of  a  given  size  k 
exists  in  the  graph.  The  formal  definition  is 

CLIQUE  =  {(G.k)  :  G  is  a  graph  containing  a  clique  of  size  k}  . 

A  naive  algorithm  for  determining  whether  a  graph  G  =  (V,  E)  with  |  F|  ver¬ 
tices  has  a  clique  of  size  k  is  to  list  all  k -subsets  of  V,  and  check  each  one  to 
see  whether  it  forms  a  clique.  The  running  time  of  this  algorithm  is  Q(k2( l^1)), 
which  is  polynomial  if  A:  is  a  constant.  In  general,  however,  k  could  be  near  \  V\  /2, 
in  which  case  the  algorithm  runs  in  superpolynomial  time.  Indeed,  an  efficient 
algorithm  for  the  clique  problem  is  unlikely  to  exist. 

Theorem  34.11 

The  clique  problem  is  NP-complete. 

Proof  To  show  that  CLIQUE  e  NP,  for  a  given  graph  G  =  (V,  E),  we  use  the 
set  V  Q  V  of  vertices  in  the  clique  as  a  certificate  for  G.  We  can  check  whether  V' 
is  a  clique  in  polynomial  time  by  checking  whether,  for  each  pair  u,  v  €  V' ,  the 
edge  (w,  v)  belongs  to  E. 

We  next  prove  that  3-CNF-SAT  <p  CLIQUE,  which  shows  that  the  clique  prob¬ 
lem  is  NP-hard.  You  might  be  surprised  that  we  should  be  able  to  prove  such  a 
result,  since  on  the  surface  logical  formulas  seem  to  have  little  to  do  with  graphs. 

The  reduction  algorithm  begins  with  an  instance  of  3-CNF-SAT.  Let  (j>  = 
Ct  A  C2  A  •  •  •  A  C*  be  a  boolean  formula  in  3-CNF  with  k  clauses.  For  r  = 
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Ci  =  xi  v  ->x2  v  ->*3 


C2  =  ->xi  vx2  vx3 


C3  =  Xi  v  x2  vx3 


Figure  34.14  The  graph  G  derived  from  the  3  CNF  formula  </>  =  Q  A  C2  A  C3,  where  Cj  = 
(*i  V  ->x2  V  ->x3),  C2  =  (-'Xj  V  X2  V  x3),  and  C3  =  (xi  V  X2  V  x3),  in  reducing  3  CNF  SAT  to 
CLIQUE  A  satisfying  assignment  of  the  formula  has  X2  =  0,  x3  =  1,  and  xi  either  0  or  1.  This 
assignment  satisfies  C\  with  ->X2,  and  it  satisfies  C2  and  C3  with  x3,  corresponding  to  the  clique 
with  lightly  shaded  vertices. 

1,2,..., k,  each  clause  Cr  has  exactly  three  distinct  literals  /[,  If,  and  If.  We  shall 
construct  a  graph  G  such  that  (p  is  satisfiable  if  and  only  if  G  has  a  clique  of  size  k. 

We  construct  the  graph  G  =  (V,  E)  as  follows.  For  each  clause  Cr  — 
(l[  v/J  V  If)  in  tp,  we  place  a  triple  of  vertices  v[,  vf,  and  vf  into  V.  We  put 
an  edge  between  two  vertices  vf  and  vf  if  both  of  the  following  hold: 

•  vf  and  vf  are  in  different  triples,  that  is,  r  ^  s,  and 

•  their  corresponding  literals  are  consistent ,  that  is,  If  is  not  the  negation  of  If. 

We  can  easily  build  this  graph  from  tp  in  polynomial  time.  As  an  example  of  this 
construction,  if  we  have 

(p  =  (xi  V  -1JC2  v  — >JC3)  A  (—1X1  V  x2  V  x3)  A  (xj  V  x2  V  x3)  , 

then  G  is  the  graph  shown  in  Figure  34.14. 

We  must  show  that  this  transformation  of  0  into  G  is  a  reduction.  First,  suppose 
that  (p  has  a  satisfying  assignment.  Then  each  clause  Cr  contains  at  least  one 
literal  If  that  is  assigned  1,  and  each  such  literal  corresponds  to  a  vertex  vf.  Picking 
one  such  “true”  literal  from  each  clause  yields  a  set  V'  of  k  vertices.  We  claim  that 
V'  is  a  clique.  For  any  two  vertices  vf,  vf  e  V',  where  r  /  s,  both  corresponding 
literals  If  and  If  map  to  1  by  the  given  satisfying  assignment,  and  thus  the  literals 
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cannot  be  complements.  Thus,  by  the  construction  of  G,  the  edge  (vf,  vj)  belongs 
to  E. 

Conversely,  suppose  that  G  has  a  clique  V'  of  size  k.  No  edges  in  G  connect 
vertices  in  the  same  triple,  and  so  V'  contains  exactly  one  vertex  per  triple.  We  can 
assign  1  to  each  literal  If  such  that  vf  e  V  without  fear  of  assigning  1  to  both  a 
literal  and  its  complement,  since  G  contains  no  edges  between  inconsistent  literals. 
Each  clause  is  satisfied,  and  so  0  is  satisfied.  (Any  variables  that  do  not  correspond 
to  a  vertex  in  the  clique  may  be  set  arbitrarily.)  ■ 

In  the  example  of  Figure  34.14,  a  satisfying  assignment  of  0  has  x2  =  0  and 
x3  =  1 .  A  corresponding  clique  of  size  k  =  3  consists  of  the  vertices  correspond¬ 
ing  to  —'X2  from  the  first  clause,  x3  from  the  second  clause,  and  x3  from  the  third 
clause.  Because  the  clique  contains  no  vertices  corresponding  to  either  X\  or  -’X1, 
we  can  set  Xi  to  either  0  or  1  in  this  satisfying  assignment. 

Observe  that  in  the  proof  of  Theorem  34.11,  we  reduced  an  arbitrary  instance 
of  3-CNF-SAT  to  an  instance  of  CFIQUE  with  a  particular  structure.  You  might 
think  that  we  have  shown  only  that  CFIQUE  is  NP-hard  in  graphs  in  which  the 
vertices  are  restricted  to  occur  in  triples  and  in  which  there  are  no  edges  between 
vertices  in  the  same  triple.  Indeed,  we  have  shown  that  CFIQUE  is  NP-hard  only 
in  this  restricted  case,  but  this  proof  suffices  to  show  that  CFIQUE  is  NP-hard  in 
general  graphs.  Why?  If  we  had  a  polynomial-time  algorithm  that  solved  CFIQUE 
on  general  graphs,  it  would  also  solve  CFIQUE  on  restricted  graphs. 

The  opposite  approach— reducing  instances  of  3-CNF-SAT  with  a  special  struc¬ 
ture  to  general  instances  of  CFIQUE— would  not  have  sufficed,  however.  Why 
not?  Perhaps  the  instances  of  3-CNF-SAT  that  we  chose  to  reduce  from  were 
“easy,”  and  so  we  would  not  have  reduced  an  NP-hard  problem  to  CFIQUE. 

Observe  also  that  the  reduction  used  the  instance  of  3-CNF-SAT,  but  not  the 
solution.  We  would  have  erred  if  the  polynomial-time  reduction  had  relied  on 
knowing  whether  the  formula  (f>  is  satisfiable,  since  we  do  not  know  how  to  decide 
whether  0  is  satisfiable  in  polynomial  time. 

34.5.2  The  vertex-cover  problem 

A  vertex  cover  of  an  undirected  graph  G  =  (V,  E)  is  a  subset  V'  C  V  such  that 
if  ( u ,  v)  €  E,  then  u  e  V1  or  v  e  V'  (or  both).  That  is,  each  vertex  “covers”  its 
incident  edges,  and  a  vertex  cover  for  G  is  a  set  of  vertices  that  covers  all  the  edges 
in  E.  The  size  of  a  vertex  cover  is  the  number  of  vertices  in  it.  For  example,  the 
graph  in  Figure  34.15(b)  has  a  vertex  cover  {w,  z}  of  size  2. 

The  vertex-cover  problem  is  to  find  a  vertex  cover  of  minimum  size  in  a  given 
graph.  Restating  this  optimization  problem  as  a  decision  problem,  we  wish  to 
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Figure  34.15  Reducing  CLIQUE  to  VERTEX  COVER,  (a)  An  undirected  graph  G  =  (V,  E )  with 
clique  V'  =  {u,  i>,  x,y}.  (b)  The  graph  G  produced  by  the  reduction  algorithm  that  has  vertex  cover 

V-V  =  {w,z}. 


determine  whether  a  graph  has  a  vertex  cover  of  a  given  size  k.  Asa  language,  we 
define 

VERTEX-COVER  =  {( G ,  k)  :  graph  G  has  a  vertex  cover  of  size  k}  . 

The  following  theorem  shows  that  this  problem  is  NP-complete. 

Theorem  34.12 

The  vertex -cover  problem  is  NP-complete. 

Proof  We  first  show  that  VERTEX-COVER  e  NP.  Suppose  we  are  given  a  graph 
G  —  ( V.E )  and  an  integer  k.  The  certificate  we  choose  is  the  vertex  cover  V'  C  V 
itself.  The  verification  algorithm  affirms  that  \  V'\  =  k.  and  then  it  checks,  for  each 
edge  (w,  v)  €  E,  that  u  G  V'  or  v  G  V'.  We  can  easily  verify  the  certificate  in 
polynomial  time. 

We  prove  that  the  vertex-cover  problem  is  NP-hard  by  showing  that  CLIQUE  <P 
VERTEX-COVER.  This  reduction  relies  on  the  notion  of  the  “complement”  of  a 
graph.  Given  an  undirected  graph  G  —  (V.  E),  we  define  the  complement  of  G 
as  G  =  (F,  E),  where  E  =  {(u,  v)  :  u,  v  €  V,u  ^  v,  and  ( u ,  v)  &  E}.  In  other 
words,  G  is  the  graph  containing  exactly  those  edges  that  are  not  in  G.  Figure  34.15 
shows  a  graph  and  its  complement  and  illustrates  the  reduction  from  CLIQUE  to 
VERTEX-COVER. 

The  reduction  algorithm  takes  as  input  an  instance  (G,  k)  of  the  clique  problem. 
It  computes  the  complement  G,  which  we  can  easily  do  in  polynomial  time.  The 
output  of  the  reduction  algorithm  is  the  instance  (G,  |F|  —  k)  of  the  vertex -cover 
problem.  To  complete  the  proof,  we  show  that  this  transformation  is  indeed  a 
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reduction:  the  graph  G  has  a  clique  of  size  k  if  and  only  if  the  graph  G  has  a  vertex 
cover  of  size  \  V\  —  k. 

Suppose  that  G  has  a  clique  V'  C  V  with  \  V'\  =  k.  We  claim  that  V  —  V'  is  a 
vertex  cover  in  G.  Let  (u,  v)  be  any  edge  in  E.  Then,  (u.  v)  E,  which  implies 
that  at  least  one  of  u  or  v  does  not  belong  to  V' ,  since  every  pair  of  vertices  in  V'  is 
connected  by  an  edge  of  E.  Equivalently,  at  least  one  of  u  or  v  is  in  V  —  V,  which 
means  that  edge  ( u ,  v)  is  covered  by  V  —  V' .  Since  ( u ,  v)  was  chosen  arbitrarily 
from  E,  every  edge  of  E  is  covered  by  a  vertex  in  V  —  V' .  Hence,  the  set  V  —  V1, 
which  has  size  \  V\  —  k,  forms  a  vertex  cover  for  G. 

Conversely,  suppose  that  G  has  a  vertex  cover  V'  c  V,  where  \  V'\  =  \  V\  —  k. 
Then,  for  all  u,  v  e  V,  if  (: u ,  v)  e  E,  then  u  e  V'  or  v  e  V'  or  both.  The 
contrapositive  of  this  implication  is  that  for  all  u.  v  e  V,  if  it  V  and  v  $  V' , 
then(u,v)  e  E.  In  other  words,  V —V'  is  a  clique,  and  it  has  size  |F|  — |  V'\  =  k.  ■ 

Since  VERTEX-COVER  is  NP-complete,  we  don’t  expect  to  find  a  polynomial¬ 
time  algorithm  for  finding  a  minimum-size  vertex  cover.  Section  35. 1  presents  a 
polynomial-time  “approximation  algorithm,”  however,  which  produces  “approxi¬ 
mate”  solutions  for  the  vertex-cover  problem.  The  size  of  a  vertex  cover  produced 
by  the  algorithm  is  at  most  twice  the  minimum  size  of  a  vertex  cover. 

Thus,  we  shouldn’t  give  up  hope  just  because  a  problem  is  NP-complete.  We 
may  be  able  to  design  a  polynomial-time  approximation  algorithm  that  obtains 
near-optimal  solutions,  even  though  finding  an  optimal  solution  is  NP-complete. 
Chapter  35  gives  several  approximation  algorithms  for  NP-complete  problems. 

34.5.3  The  hamiltonian-cycle  problem 

We  now  return  to  the  hamiltonian-cycle  problem  defined  in  Section  34.2. 

Theorem  34.13 

The  hamiltonian  cycle  problem  is  NP-complete. 

Proof  We  first  show  that  HAM-CYCLE  belongs  to  NP.  Given  a  graph  G  = 
(V,  E),  our  certificate  is  the  sequence  of  |  V\  vertices  that  makes  up  the  hamiltonian 
cycle.  The  verification  algorithm  checks  that  this  sequence  contains  each  vertex 
in  V  exactly  once  and  that  with  the  first  vertex  repeated  at  the  end,  it  forms  a  cycle 
in  G.  That  is,  it  checks  that  there  is  an  edge  between  each  pair  of  consecutive 
vertices  and  between  the  first  and  last  vertices.  We  can  verify  the  certificate  in 
polynomial  time. 

We  now  prove  that  VERTEX-COVER  <P  HAM-CYCLE,  which  shows  that 
HAM-CYCLE  is  NP-complete.  Given  an  undirected  graph  G  =  (V.  E)  and  an 
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Figure  34.16  The  widget  used  in  reducing  the  vertex  cover  problem  to  the  hamiltonian  cycle  prob 
lem.  An  edge  (u,  v)  of  graph  G  corresponds  to  widget  Wuv  in  the  graph  G'  created  in  the  reduction, 
(a)  The  widget,  with  individual  vertices  labeled,  (b)  (d)  The  shaded  paths  are  the  only  possible  ones 
through  the  widget  that  include  all  vertices,  assuming  that  the  only  connections  from  the  widget  to 
the  remainder  of  G'  are  through  vertices  [u,  i>,  1],  [w,  v,  6],  [v,  u,  1],  and  [v,  u,  6]. 


integer  k,  we  construct  an  undirected  graph  G'  =  (V,  E')  that  has  a  hamiltonian 
cycle  if  and  only  if  G  has  a  vertex  cover  of  size  k. 

Our  construction  uses  a  widget,  which  is  a  piece  of  a  graph  that  enforces  certain 
properties.  Figure  34. 1 6(a)  shows  the  widget  we  use.  For  each  edge  (w,v)  e  E,  the 
graph  G'  that  we  construct  will  contain  one  copy  of  this  widget,  which  we  denote 
by  We  denote  each  vertex  in  Wuv  by  [w,  v,  /]  or  [v,  w,  i],  where  1  <  i  <  6,  so 
that  each  widget  Wuv  contains  12  vertices.  Widget  Wuv  also  contains  the  14  edges 
shown  in  Figure  34. 16(a). 

Along  with  the  internal  structure  of  the  widget,  we  enforce  the  properties  we 
want  by  limiting  the  connections  between  the  widget  and  the  remainder  of  the 
graph  G'  that  we  construct.  In  particular,  only  vertices  [u,  v,  1],  [u,  v,  6],  [v,m,  1], 
and  [v,w,6]  will  have  edges  incident  from  outside  Wuv.  Any  hamiltonian  cycle 
of  G'  must  traverse  the  edges  of  Wuv  in  one  of  the  three  ways  shown  in  Fig¬ 
ures  34. 16(b)-(d).  If  the  cycle  enters  through  vertex  [w,  v,  1],  it  must  exit  through 
vertex  [w,  v,  6],  and  it  either  visits  all  12  of  the  widget’s  vertices  (Figure  34. 16(b)) 
or  the  six  vertices  [w,  u,  1]  through  [u,  v,  6]  (Figure  34.16(c)).  In  the  latter  case, 
the  cycle  will  have  to  reenter  the  widget  to  visit  vertices  [v,  u,  1]  through  [u,  w,  6], 
Similarly,  if  the  cycle  enters  through  vertex  [v,  w,  1],  it  must  exit  through  ver¬ 
tex  [v,  u,  6],  and  it  either  visits  all  12  of  the  widget’s  vertices  (Figure  34.16(d))  or 
the  six  vertices  [v,  u,  1]  through  [v,  u,  6]  (Figure  34.16(c)).  No  other  paths  through 
the  widget  that  visit  all  12  vertices  are  possible.  In  particular,  it  is  impossible  to 
construct  two  vertex -disjoint  paths,  one  of  which  connects  [w,  v,  1]  to  [v,  u,  6]  and 
the  other  of  which  connects  [v,  u,  1]  to  [w,  v,  6],  such  that  the  union  of  the  two  paths 
contains  all  of  the  widget’s  vertices. 
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(a) 


(b) 


Figure  34.17  Reducing  an  instance  of  the  vertex  cover  problem  to  an  instance  of  the  hamiitonian 
cycle  problem,  (a)  An  undirected  graph  G  with  a  vertex  cover  of  size  2,  consisting  of  the  lightly 
shaded  vertices  w  and  y.  (b)  The  undirected  graph  G'  produced  by  the  reduction,  with  the  hamilto 
nian  path  corresponding  to  the  vertex  cover  shaded.  The  vertex  cover  { u; ,  >■ }  corresponds  to  edges 
(jj,  [u),  x,  1])  and  (52.  [y,x,  1])  appearing  in  the  hamiitonian  cycle. 


The  only  other  vertices  in  V  other  than  those  of  widgets  are  selector  vertices 
si,s2, . ..  ,Sk.  We  use  edges  incident  on  selector  vertices  in  G'  to  select  the  k 
vertices  of  the  cover  in  G. 

In  addition  to  the  edges  in  widgets,  E'  contains  two  other  types  of  edges,  which 
Figure  34.17  shows.  First,  for  each  vertex  u  €  V,  we  add  edges  to  join  pairs 
of  widgets  in  order  to  form  a  path  containing  all  widgets  corresponding  to  edges 
incident  on  u  in  G.  We  arbitrarily  order  the  vertices  adjacent  to  each  vertex 
u  e  V  as  w(1),M(2),...,w(degree(u)),  where  degree(w)  is  the  number  of  vertices 
adjacent  to  u.  We  create  a  path  in  G’  through  all  the  widgets  corresponding 
to  edges  incident  on  u  by  adding  to  E'  the  edges  {([u,  u{,\ 6],  [w,  u(,+l\  1])  : 
1  <  t  <  degree(u)  —  1}.  In  Figure  34.17,  for  example,  we  order  the  vertices  ad¬ 
jacent  to  w  as  x,  y,  z,  and  so  graph  G'  in  part  (b)  of  the  figure  includes  the  edges 
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([w,x,  6],  [w,  y,  1])  and  ([w,  y,  6],  [w,  z,  1])-  For  each  vertex  u  e  V,  these  edges 
in  G'  fill  in  a  path  containing  all  widgets  corresponding  to  edges  incident  on  u 
in  G. 

The  intuition  behind  these  edges  is  that  if  we  choose  a  vertex  u  €  V  in  the  vertex 
cover  of  G,  we  can  construct  a  path  from  [u,  u(1),  1]  to  [u,  M(degree(“))j  6]  in  G'  that 
“covers”  all  widgets  corresponding  to  edges  incident  on  u.  That  is,  for  each  of  these 
widgets,  say  Wuuu),  the  path  either  includes  all  12  vertices  (if  u  is  in  the  vertex 
cover  but  u(,)  is  not)  or  just  the  six  vertices  [u,  u 1],  [u,  ,  2], . . . ,  [u,  6]  (if 

both  u  and  u{,)  are  in  the  vertex  cover). 

The  final  type  of  edge  in  E’  joins  the  first  vertex  [u,  u ( 11 ,  1]  and  the  last  vertex 
[u,  M(degrcc(“)),  6]  of  each  of  these  paths  to  each  of  the  selector  vertices.  That  is,  we 
include  the  edges 

{(Sj,  [u,  u(1\  1])  :  u  e  V  and  I  <  j  <  k} 

U  {(sj,  [u,  u(degree(M)),  6])  :  u  e  V  and  1  <  j  <  k)  . 

Next,  we  show  that  the  size  of  G'  is  polynomial  in  the  size  of  G,  and  hence  we 
can  construct  G'  in  time  polynomial  in  the  size  of  G.  The  vertices  of  G'  are  those 
in  the  widgets,  plus  the  selector  vertices.  With  12  vertices  per  widget,  plus  k  <  \  V\ 
selector  vertices,  we  have  a  total  of 

\V'\  =  12\E\+k 

<  12\E\  +  \V\ 

vertices.  The  edges  of  G'  are  those  in  the  widgets,  those  that  go  between  widgets, 
and  those  connecting  selector  vertices  to  widgets.  Each  widget  contains  14  edges, 
totaling  14  | E |  in  all  widgets.  For  each  vertex  u  €  V,  graph  G'  has  degree(u)  —  1 
edges  going  between  widgets,  so  that  summed  over  all  vertices  in  V, 

^(degree(u)  -1)  =  2  \E\-  \V\ 

ueV 

edges  go  between  widgets.  Finally,  G'  has  two  edges  for  each  pair  consisting  of  a 
selector  vertex  and  a  vertex  of  V,  totaling  2k  \  V  \  such  edges.  The  total  number  of 
edges  of  G'  is  therefore 

\E'\  =  (U\E\)  +  (2\E\-\V\)  +  (2k\V\) 

=  16  \E\  +  (2k-l) \V\ 

<  16  \E\  +  (2 \V\-l)\V\  . 

Now  we  show  that  the  transformation  from  graph  G  to  G'  is  a  reduction.  That  is, 
we  must  show  that  G  has  a  vertex  cover  of  size  k  if  and  only  if  G  '  has  a  hamiltonian 
cycle. 
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Suppose  that  G  =  (V.  E)  has  a  vertex  cover  V*  C  V  of  size  k.  Let 
V*  =  {u\,u2, . . .  ,Uk}.  As  Figure  34.17  shows,  we  form  a  hamiltonian  cy¬ 
cle  in  G'  by  including  the  following  edges10  for  each  vertex  Uj  €  V* .  Include 
edges  {{[uj,  uj \  6],  [uj,Uj  +1),  1])  :  1  <  i  <  degree  (w,)  —  l},  which  connect  all 
widgets  corresponding  to  edges  incident  on  Uj .  We  also  include  the  edges  within 
these  widgets  as  Figures  34.16(b)-(d)  show,  depending  on  whether  the  edge  is  cov¬ 
ered  by  one  or  two  vertices  in  V*.  The  hamiltonian  cycle  also  includes  the  edges 

{( Sj ,  [Uj,uf\  1])  :  1  <  j  <  k} 

G{(su[uk,uf^{Uk)\6])}. 

By  inspecting  Figure  34.17,  you  can  verify  that  these  edges  form  a  cycle.  The  cycle 
stalls  at  ,S  | ,  visits  all  widgets  corresponding  to  edges  incident  on  U  \ ,  then  visits  s2, 
visits  all  widgets  corresponding  to  edges  incident  on  u2,  and  so  on,  until  it  returns 
to  .  The  cycle  visits  each  widget  either  once  or  twice,  depending  on  whether  one 
or  two  vertices  of  V*  cover  its  corresponding  edge.  Because  V*  is  a  vertex  cover 
for  G,  each  edge  in  E  is  incident  on  some  vertex  in  V*,  and  so  the  cycle  visits  each 
vertex  in  each  widget  of  G' .  Because  the  cycle  also  visits  every  selector  vertex,  it 
is  hamiltonian. 

Conversely,  suppose  that  G'  =  ( V.  E')  has  a  hamiltonian  cycle  C  c  E' .  We 
claim  that  the  set 

V*  —  {u  e  V  :  ( Sj ,  [u,  m(1\  1])  e  C  for  some  1  <  j  <  k}  (34.4) 

is  a  vertex  cover  for  G.  To  see  why,  partition  C  into  maximal  paths  that  start  at 
some  selector  vertex  Si,  traverse  an  edge  (.S', ,  [»,  n(1\  1])  for  some  u  €  V,  and  end 
at  a  selector  vertex  Sj  without  passing  through  any  other  selector  vertex.  Let  us  call 
each  such  path  a  “cover  path.”  From  how  G'  is  constructed,  each  cover  path  must 
staid  at  some  st,  take  the  edge  (.s,- ,  [u,  w(1),  1])  for  some  vertex  u  €  V ,  pass  through 
all  the  widgets  corresponding  to  edges  in  E  incident  on  u,  and  then  end  at  some 
selector  vertex  sj.  We  refer  to  this  cover  path  as  pu,  and  by  equation  (34.4),  we 
put  u  into  V*.  Each  widget  visited  by  p„  must  be  Wuv  or  Wvu  for  some  v  e  V. 
For  each  widget  visited  by  pu,  its  vertices  are  visited  by  either  one  or  two  cover 
paths.  If  they  are  visited  by  one  cover  path,  then  edge  (u,v)  €  E  is  covered  in  G 
by  vertex  u.  If  two  cover  paths  visit  the  widget,  then  the  other  cover  path  must 
be  pv,  which  implies  that  v  e  V* ,  and  edge  (u.v)  e  E  is  covered  by  both  u  and  v. 


10Technically,  we  define  a  cycle  in  terms  of  vertices  rather  than  edges  (see  Section  B.4).  In  the 
interest  of  clarity,  we  abuse  notation  here  and  define  the  hamiltonian  cycle  in  terms  of  edges. 


1096 


Chapter  34  NP  Completeness 


Figure  34.18  An  instance  of  the  traveling  salesman  problem.  Shaded  edges  represent  a  minimum 
cost  tour,  with  cost  7. 


Because  each  vertex  in  each  widget  is  visited  by  some  cover  path,  we  see  that  each 
edge  in  E  is  covered  by  some  vertex  in  V*.  u 


34.5.4  The  traveling-salesman  problem 

In  the  traveling-salesman  problem ,  which  is  closely  related  to  the  hamiltonian- 
cycle  problem,  a  salesman  must  visit  n  cities.  Modeling  the  problem  as  a  complete 
graph  with  n  vertices,  we  can  say  that  the  salesman  wishes  to  make  a  tour,  or 
hamiltonian  cycle,  visiting  each  city  exactly  once  and  finishing  at  the  city  he  starts 
from.  The  salesman  incurs  a  nonnegative  integer  cost  c(i,j)  to  travel  from  city  i 
to  city  j ,  and  the  salesman  wishes  to  make  the  tour  whose  total  cost  is  minimum, 
where  the  total  cost  is  the  sum  of  the  individual  costs  along  the  edges  of  the  tour. 
For  example,  in  Figure  34.18,  a  minimum-cost  tour  is  (u,  w,  v,x,  u),  with  cost  7. 
The  formal  language  for  the  corresponding  decision  problem  is 

TSP  =  {{G,c,k)  :  G  =  (V,  E )  is  a  complete  graph, 
c  is  a  function  from  V  x  V  — >  Z, 
k  e  Z,  and 

G  has  a  traveling- salesman  tour  with  cost  at  most  A:}  . 

The  following  theorem  shows  that  a  fast  algorithm  for  the  traveling-salesman 
problem  is  unlikely  to  exist. 

Theorem  34.14 

The  traveling-salesman  problem  is  NP-complete. 

Proof  We  first  show  that  TSP  belongs  to  NP.  Given  an  instance  of  the  problem, 
we  use  as  a  certificate  the  sequence  of  n  vertices  in  the  tour.  The  verification 
algorithm  checks  that  this  sequence  contains  each  vertex  exactly  once,  sums  up  the 
edge  costs,  and  checks  whether  the  sum  is  at  most  k.  This  process  can  certainly  be 
done  in  polynomial  time. 
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To  prove  that  TSP  is  NP-hard,  we  show  that  HAM-CYCLE  <P  TSP.  Let 
G  —  (V.  E)  be  an  instance  of  HAM-CYCLE.  We  construct  an  instance  of  TSP  as 
follows.  We  form  the  complete  graph  G'  =  (V.  E'),  where  E'  =  {(i,  j)  :  i,  j  €  V 
and  i  ^  j  },  and  we  define  the  cost  function  c  by 


0  if  (L  j)  e  E  , 
1  if  (i,j)f(E. 


(Note  that  because  G  is  undirected,  it  has  no  self-loops,  and  so  c(v,  v)  =  1  for  all 
vertices  v  €  V .)  The  instance  of  TSPis  then  (G\  c,  0),  which  we  can  easily  create 
in  polynomial  time. 

We  now  show  that  graph  G  has  a  hamiltonian  cycle  if  and  only  if  graph  G'  has  a 
tour  of  cost  at  most  0.  Suppose  that  graph  G  has  a  hamiltonian  cycle  h.  Each  edge 
in  h  belongs  to  E  and  thus  has  cost  0  in  G'.  Thus,  h  is  a  tour  in  G'  with  cost  0. 
Conversely,  suppose  that  graph  G'  has  a  tour  h!  of  cost  at  most  0.  Since  the  costs 
of  the  edges  in  E'  are  0  and  1,  the  cost  of  tour  It  is  exactly  0  and  each  edge  on  the 
tour  must  have  cost  0.  Therefore,  h'  contains  only  edges  in  E.  We  conclude  that  h' 
is  a  hamiltonian  cycle  in  graph  G.  ■ 


34.5.5  The  subset-sum  problem 

We  next  consider  an  arithmetic  NP-complete  problem.  In  the  subset-sum  problem , 
we  are  given  a  finite  set  S  of  positive  integers  and  an  integer  target  t  >  0.  We  ask 
whether  there  exists  a  subset  S'  c  S  whose  elements  sum  to  t.  For  example, 
if  5  =  {1,2,7,  14,49,98,343,686,2409,2793,  16808,  17206,  1 17705,  1 17993} 
and  t  =  138457,  then  the  subset  S'  =  {1,  2, 7,  98,  343,  686,  2409,  17206,  117705} 
is  a  solution. 

As  usual,  we  define  the  problem  as  a  language: 

SUBSET-SUM  =  {( S ,  t)  :  there  exists  a  subset  S'  c  S  such  that  t  =  J2ssSi  5}  . 

As  with  any  arithmetic  problem,  it  is  important  to  recall  that  our  standard  encoding 
assumes  that  the  input  integers  are  coded  in  binary.  With  this  assumption  in  mind, 
we  can  show  that  the  subset-sum  problem  is  unlikely  to  have  a  fast  algorithm. 

Theorem  34.15 

The  subset-sum  problem  is  NP-complete. 

Proof  To  show  that  SUBSET-SUM  is  in  NP,  for  an  instance  (S,  t)  of  the  problem, 
we  let  the  subset  S'  be  the  certificate.  A  verification  algorithm  can  check  whether 
t  =  X^eS' s  'n  polynomial  time. 

We  now  show  that  3-CNF-SAT  <P  SUBSET-SUM.  Given  a  3-CNF  formula  f 
over  variables  X\ ,  x2, . . . ,  xn  with  clauses  C\ .  C2, ....  64,  each  containing  exactly 
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three  distinct  literals,  the  reduction  algorithm  constructs  an  instance  (S,  t)  of  the 
subset-sum  problem  such  that  cf>  is  satisfiable  if  and  only  if  there  exists  a  subset 
of  S  whose  sum  is  exactly  t.  Without  loss  of  generality,  we  make  two  simplifying 
assumptions  about  the  formula  4>.  First,  no  clause  contains  both  a  variable  and  its 
negation,  for  such  a  clause  is  automatically  satisfied  by  any  assignment  of  values 
to  the  variables.  Second,  each  variable  appears  in  at  least  one  clause,  because  it 
does  not  matter  what  value  is  assigned  to  a  variable  that  appeal's  in  no  clauses. 

The  reduction  creates  two  numbers  in  set  S  for  each  variable  x,  and  two  numbers 
in  S  for  each  clause  Cj.  We  shall  create  numbers  in  base  10,  where  each  number 
contains  n+k  digits  and  each  digit  corresponds  to  either  one  variable  or  one  clause. 
Base  10  (and  other  bases,  as  we  shall  see)  has  the  property  we  need  of  preventing 
carries  from  lower  digits  to  higher  digits. 

As  Figure  34.19  shows,  we  construct  set  S  and  target  t  as  follows.  We  label 
each  digit  position  by  either  a  variable  or  a  clause.  The  least  significant  k  digits  are 
labeled  by  the  clauses,  and  the  most  significant  n  digits  are  labeled  by  variables. 

*  The  target  t  has  a  1  in  each  digit  labeled  by  a  variable  and  a  4  in  each  digit 
labeled  by  a  clause. 

*  For  each  variable  x, ,  set  S  contains  two  integers  v,-  and  v\.  Each  of  ly  and  v- 
has  a  1  in  the  digit  labeled  by  x,  and  Os  in  the  other  variable  digits.  If  literal  jc,- 
appears  in  clause  C7,  then  the  digit  labeled  by  Cj  in  ly  contains  a  1.  If  lit¬ 
eral  —>Xi  appeal's  in  clause  Cj ,  then  the  digit  labeled  by  C7  in  v  ■  contains  a  1 . 
All  other  digits  labeled  by  clauses  in  v,  and  v'  are  0. 

All  v,  and  v\  values  in  set  S  are  unique.  Why?  For  /  ^  i,  no  ly  or  v\  values  can 
equal  v,  and  v-  in  the  most  significant  n  digits.  Furthermore,  by  our  simplifying 
assumptions  above,  no  v,  and  v\  can  be  equal  in  all  k  least  significant  digits. 
If  v,  and  v-  were  equal,  then  x,  and  -’X,-  would  have  to  appeal'  in  exactly  the 
same  set  of  clauses.  But  we  assume  that  no  clause  contains  both  x,-  and  -ay 
and  that  either  x,  or  ->x,-  appears  in  some  clause,  and  so  there  must  be  some 
clause  Cj  for  which  ly  and  v-  differ. 

*  For  each  clause  C7 ,  set  S  contains  two  integers  .sy  and  s'- .  Each  of  Sj  and  s'-  has 
Os  in  all  digits  other  than  the  one  labeled  by  Cj .  For  .sy ,  there  is  a  1  in  the  C7 
digit,  and  s'-  has  a  2  in  this  digit.  These  integers  are  “slack  variables,”  which  we 
use  to  get  each  clause-labeled  digit  position  to  add  to  the  target  value  of  4. 

Simple  inspection  of  Figure  34.19  demonstrates  that  all  Sj  and  s'-  values  in  S 
are  unique  in  set  S. 

Note  that  the  greatest  sum  of  digits  in  any  one  digit  position  is  6,  which  occurs  in 
the  digits  labeled  by  clauses  (three  Is  from  the  ly  and  v-  values,  plus  1  and  2  from 


34.5  NP  complete  problems 


1099 


X\  X2  Xj  Cl  C2  C3  C4 


t  =  1  1  1  4  4  4  4 


Figure  34.19  The  reduction  of  3  CNF  SAT  to  SUBSET  SUM.  The  formula  in  3  CNF  is  <p  = 
Ci  AC2AC3AC4,  where  Ci  =  (x\  v—‘X2'4~,xj),  C2  =  (-,jciV-,jc2V_,jc3),  C3  =  (-’XiV-'^v^), 
and  C4  =  (xi  V  X2  V  X3).  A  satisfying  assignment  of  <p  is  (x\  =  0,  X2  =  0,  X3  =  1).  The  set  S 
produced  by  the  reduction  consists  of  the  base  10  numbers  shown;  reading  from  top  to  bottom,  S  = 
{1001001,  1000110,  100001,  101110,  10011,  11100,  1000,  2000,  100,200,  10,20,  1,2}.  The  target? 
is  1 1 14444.  The  subset  S'  C  S  is  lightly  shaded,  and  it  contains  u^,  v'2 ,  and  V3,  corresponding  to  the 
satisfying  assignment.  It  also  contains  slack  variables  ,?i,  jj,  s'2,  s 3,  ,54,  and  s 4  to  achieve  the  target 
value  of  4  in  the  digits  labeled  by  Ci  through  C4. 

the  Sj  and  C  values).  Interpreting  these  numbers  in  base  10,  therefore,  no  carries 
can  occur  from  lower  digits  to  higher  digits.1 11 

We  can  perform  the  reduction  in  polynomial  time.  The  set  S  contains  2 n  +  2k 
values,  each  of  which  has  n  +  k  digits,  and  the  time  to  produce  each  digit  is  poly¬ 
nomial  in  n  +  k.  The  target  t  has  n  +  k  digits,  and  the  reduction  produces  each  in 
constant  time. 

We  now  show  that  the  3-CNF  formula  cf>  is  satisfiable  if  and  only  if  there  exists 
a  subset  S'  c  S  whose  sum  is  t.  First,  suppose  that  cf>  has  a  satisfying  assignment. 
For  i  =  1, 2, . . . ,  n,  if  x,-  =  1  in  this  assignment,  then  include  V;  in  S'.  Otherwise, 
include  v[.  In  other  words,  we  include  in  S'  exactly  the  v,  and  v\  values  that  cor- 


1  Tn  fact,  any  base  b,  where  b  >  7,  would  work.  The  instance  at  the  beginning  of  this  subsection  is 

the  set  S  and  target  t  in  Figure  34. 19  interpreted  in  base  7,  with  S  listed  in  sorted  order. 
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respond  to  literals  with  the  value  1  in  the  satisfying  assignment.  Having  included 
either  u,  or  v[,  but  not  both,  for  all  i,  and  having  put  0  in  the  digits  labeled  by 
variables  in  all  Sj  and  s'- ,  we  see  that  for  each  variable-labeled  digit,  the  sum  of  the 
values  of  S'  must  be  1,  which  matches  those  digits  of  the  target  t.  Because  each 
clause  is  satisfied,  the  clause  contains  some  literal  with  the  value  1.  Therefore, 
each  digit  labeled  by  a  clause  has  at  least  one  1  contributed  to  its  sum  by  a  v,  or  re¬ 
value  in  S'.  In  fact,  1,  2,  or  3  literals  may  be  1  in  each  clause,  and  so  each  clause- 
labeled  digit  has  a  sum  of  1,  2,  or  3  from  the  v,  and  v\  values  in  S'.  In  Figure  34.19 
for  example,  literals  — ,  ->x2,  and  x3  have  the  value  1  in  a  satisfying  assignment. 
Each  of  clauses  C\  and  C 4  contains  exactly  one  of  these  literals,  and  so  together  v\ , 
v'2,  and  v3  contribute  1  to  the  sum  in  the  digits  for  C,  and  C4.  Clause  C2  contains 
two  of  these  literals,  and  v[,  v'2,  and  v3  contribute  2  to  the  sum  in  the  digit  for  C2. 
Clause  C3  contains  all  three  of  these  literals,  and  v' ,  v2,  and  v3  contribute  3  to  the 
sum  in  the  digit  for  C3.  We  achieve  the  target  of  4  in  each  digit  labeled  by  clause  Cj 
by  including  in  S'  the  appropriate  nonempty  subset  of  slack  variables  {3/, In 
Figure  34.19,  S'  includes  Si,  s', ,  s'2,  s3,  s4,  and  s\.  Since  we  have  matched  the  target 
in  all  digits  of  the  sum,  and  no  carries  can  occur,  the  values  of  S'  sum  to  t. 

Now,  suppose  that  there  is  a  subset  S'  C  S  that  sums  to  t.  The  subset  S'  must 
include  exactly  one  of  ly  and  v\  for  each  i  =  1, 2, for  otherwise  the  digits 
labeled  by  variables  would  not  sum  to  1.  If  V;  €  S',  we  set  x,  =  1.  Otherwise, 
v-  €  S',  and  we  set  x,  =  0.  We  claim  that  every  clause  Cj,  for  j  =  1,2,...,  k,  is 
satisfied  by  this  assignment.  To  prove  this  claim,  note  that  to  achieve  a  sum  of  4  in 
the  digit  labeled  by  Cj,  the  subset  S'  must  include  at  least  one  ly  or  v'  value  that 
has  a  1  in  the  digit  labeled  by  Cj ,  since  the  contributions  of  the  slack  variables  Sj 
and  s'j  together  sum  to  at  most  3.  If  S'  includes  a  v,  that  has  a  1  in  C/s  position, 
then  the  literal  x,  appears  in  clause  Cj.  Since  we  have  set  X;  =  1  when  ly  e  S', 
clause  Cj  is  satisfied.  If  S'  includes  a  v-  that  has  a  1  in  that  position,  then  the 
literal  — >x(-  appears  in  Cj.  Since  we  have  set  x,  =  0  when  v'  e  S',  clause  Cj  is 
again  satisfied.  Thus,  all  clauses  of  cf>  are  satisfied,  which  completes  the  proof.  ■ 


Exercises 


34.5-1 

The  subgraph-isomorphism  problem  takes  two  undirected  graphs  G ,  and  G2,  and 
it  asks  whether  G  t  is  isomorphic  to  a  subgraph  of  G2.  Show  that  the  subgraph- 
isomorphism  problem  is  NP-complete. 


34.5-2 

Given  an  integer  m  x  11  matrix  A  and  an  integer  m -vector  b,  the  0-1  integer¬ 
programming  problem  asks  whether  there  exists  an  integer  n > vector  x  with  ele- 
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ments  in  the  set  {0,  1}  such  that  Ax  <  b.  Prove  that  0-1  integer  programming  is 
NP -complete.  {Hint:  Reduce  from  3-CNF-SAT.) 


34.5-3 

The  integer  linear-programming  problem  is  like  the  0-1  integer-programming 
problem  given  in  Exercise  34.5-2,  except  that  the  values  of  the  vector  x  may  be 
any  integers  rather  than  just  0  or  1.  Assuming  that  the  0-1  integer-programming 
problem  is  NP-hard,  show  that  the  integer  linear-programming  problem  is  NP- 
complete. 


34.5-4 

Show  how  to  solve  the  subset-sum  problem  in  polynomial  time  if  the  target  value  t 
is  expressed  in  unary. 


34.5- 5 

The  set-partition  problem  takes  as  input  a  set  S  of  numbers.  The  question  is 
whether  the  numbers  can  be  partitioned  into  two  sets  A  and  A  =  S  —  A  such 
that  ^2x€A  x  =  Show  that  the  set-partition  problem  is  NP-complete. 

34.5- 6 

Show  that  the  hamiltonian-path  problem  is  NP-complete. 


34.5-7 

The  longest-simple-cycle  problem  is  the  problem  of  determining  a  simple  cycle 
(no  repeated  vertices)  of  maximum  length  in  a  graph.  Formulate  a  related  decision 
problem,  and  show  that  the  decision  problem  is  NP-complete. 


34.5-8 

In  the  half  3-CNF  satisfiability  problem,  we  are  given  a  3-CNF  formula  f  with  n 
variables  and  m  clauses,  where  m  is  even.  We  wish  to  determine  whether  there 
exists  a  truth  assignment  to  the  variables  of  </>  such  that  exactly  half  the  clauses 
evaluate  to  0  and  exactly  half  the  clauses  evaluate  to  1.  Prove  that  the  half  3-CNF 
satisfiability  problem  is  NP-complete. 


Problems 


34-1  Independent  set 

An  independent  set  of  a  graph  G  =  {V,  E)  is  a  subset  V'  C  V  of  vertices  such 
that  each  edge  in  E  is  incident  on  at  most  one  vertex  in  V' .  The  independent-set 
problem  is  to  find  a  maximum-size  independent  set  in  G. 
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a.  Formulate  a  related  decision  problem  for  the  independent-set  problem,  and 
prove  that  it  is  NP-complete.  {Hint:  Reduce  from  the  clique  problem.) 

b.  Suppose  that  you  are  given  a  “black-box”  subroutine  to  solve  the  decision  prob¬ 
lem  you  defined  in  part  (a).  Give  an  algorithm  to  find  an  independent  set  of  max¬ 
imum  size.  The  running  time  of  your  algorithm  should  be  polynomial  in  \V\ 
and  \  E\,  counting  queries  to  the  black  box  as  a  single  step. 

Although  the  independent-set  decision  problem  is  NP-complete,  certain  special 
cases  are  polynomial-time  solvable. 

c.  Give  an  efficient  algorithm  to  solve  the  independent-set  problem  when  each  ver¬ 
tex  in  G  has  degree  2.  Analyze  the  running  time,  and  prove  that  your  algorithm 
works  correctly. 

d.  Give  an  efficient  algorithm  to  solve  the  independent-set  problem  when  G  is 
bipartite.  Analyze  the  running  time,  and  prove  that  your  algorithm  works  cor¬ 
rectly.  {Hint:  Use  the  results  of  Section  26.3.) 

34-2  Bonnie  and  Clyde 

Bonnie  and  Clyde  have  just  robbed  a  bank.  They  have  a  bag  of  money  and  want 
to  divide  it  up.  For  each  of  the  following  scenarios,  either  give  a  polynomial-time 
algorithm,  or  prove  that  the  problem  is  NP-complete.  The  input  in  each  case  is  a 
list  of  the  n  items  in  the  bag,  along  with  the  value  of  each. 

a.  The  bag  contains  n  coins,  but  only  2  different  denominations:  some  coins  are 
worth  x  dollars,  and  some  are  worth  y  dollars.  Bonnie  and  Clyde  wish  to  divide 
the  money  exactly  evenly. 

b.  The  bag  contains  n  coins,  with  an  arbitrary  number  of  different  denominations, 
but  each  denomination  is  a  nonnegative  integer  power  of  2,  i.e.,  the  possible 
denominations  are  1  dollar,  2  dollars,  4  dollars,  etc.  Bonnie  and  Clyde  wish  to 
divide  the  money  exactly  evenly. 

c.  The  bag  contains  n  checks,  which  are,  in  an  amazing  coincidence,  made  out  to 
“Bonnie  or  Clyde.”  They  wish  to  divide  the  checks  so  that  they  each  get  the 
exact  same  amount  of  money. 

d.  The  bag  contains  n  checks  as  in  part  (c),  but  this  time  Bonnie  and  Clyde  are 
willing  to  accept  a  split  in  which  the  difference  is  no  larger  than  100  dollars. 
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34-3  Graph  coloring 

Mapmakers  try  to  use  as  few  colors  as  possible  when  coloring  countries  on  a  map, 
as  long  as  no  two  countries  that  share  a  border  have  the  same  color.  We  can  model 
this  problem  with  an  undirected  graph  G  =  (V,  E)  in  which  each  vertex  repre¬ 
sents  a  country  and  vertices  whose  respective  countries  share  a  border  are  adjacent. 
Then,  a  k-coloring  is  a  function  c  :  V  — >  {1, 2, . . . ,  k}  such  that  c(u )  ^  c(v)  for 
every  edge  (u,  v)  e  E.  In  other  words,  the  numbers  1,2 , ,k  represent  the  k  col¬ 
ors,  and  adjacent  vertices  must  have  different  colors.  The  graph-coloring  problem 
is  to  determine  the  minimum  number  of  colors  needed  to  color  a  given  graph. 

a.  Give  an  efficient  algorithm  to  determine  a  2-coloring  of  a  graph,  if  one  exists. 

b.  Cast  the  graph-coloring  problem  as  a  decision  problem.  Show  that  your  deci¬ 
sion  problem  is  solvable  in  polynomial  time  if  and  only  if  the  graph-coloring 
problem  is  solvable  in  polynomial  time. 

c.  Let  the  language  3-COLOR  be  the  set  of  graphs  that  can  be  3-colored.  Show 
that  if  3-COLOR  is  NP-complete,  then  your  decision  problem  from  part  (b)  is 
NP-complete. 

To  prove  that  3-COLOR  is  NP-complete,  we  use  a  reduction  from  3-CNF-SAT. 
Given  a  formula  cf>  of  m  clauses  on  n  variables  X\,  x2, . . . ,  xn,  we  construct  a  graph 
G  =  (V,  E)  as  follows.  The  set  V  consists  of  a  vertex  for  each  variable,  a  vertex 
for  the  negation  of  each  variable,  5  vertices  for  each  clause,  and  3  special  vertices: 
TRUE,  FALSE,  and  RED.  The  edges  of  the  graph  are  of  two  types:  “literal”  edges 
that  are  independent  of  the  clauses  and  “clause”  edges  that  depend  on  the  clauses. 
The  literal  edges  form  a  triangle  on  the  special  vertices  and  also  form  a  triangle  on 
Xi,  ->Xi,  and  RED  for  i  =  1,2, ...  ,n. 

d.  Argue  that  in  any  3-coloring  c  of  a  graph  containing  the  literal  edges,  exactly 
one  of  a  variable  and  its  negation  is  colored  c  (true)  and  the  other  is  colored 
c (false).  Argue  that  for  any  truth  assignment  for  cp,  there  exists  a  3-coloring 
of  the  graph  containing  just  the  literal  edges. 

The  widget  shown  in  Figure  34.20  helps  to  enforce  the  condition  corresponding  to 
a  clause  (x  V  y  V  z).  Each  clause  requires  a  unique  copy  of  the  5  vertices  that  are 
heavily  shaded  in  the  figure;  they  connect  as  shown  to  the  literals  of  the  clause  and 
the  special  vertex  TRUE. 

e.  Argue  that  if  each  of  x,  y,  and  z  is  colored  c(true)  or  c (false),  then  the 
widget  is  3-colorable  if  and  only  if  at  least  one  of  x,  y,  or  z  is  colored  c(true). 

/.  Complete  the  proof  that  3-COLOR  is  NP-complete. 
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Figure  34.20  The  widget  corresponding  to  a  clause  (.tvyv  z),  used  in  Problem  34  3. 

34-4  Scheduling  with  profits  and  deadlines 

Suppose  that  we  have  one  machine  and  a  set  of  n  tasks  ax,a2,  ■ .  .,a„,  each  of 
which  requires  time  on  the  machine.  Each  task  aj  requires  tj  time  units  on  the 
machine  (its  processing  time),  yields  a  profit  of  pj,  and  has  a  deadline  dj.  The 
machine  can  process  only  one  task  at  a  time,  and  task  aj  must  run  without  inter¬ 
ruption  for  tj  consecutive  time  units.  If  we  complete  task  aj  by  its  deadline  dj,  we 
receive  a  profit  Pj,  but  if  we  complete  it  after  its  deadline,  we  receive  no  profit.  As 
an  optimization  problem,  we  are  given  the  processing  times,  profits,  and  deadlines 
for  a  set  of  n  tasks,  and  we  wish  to  find  a  schedule  that  completes  all  the  tasks  and 
returns  the  greatest  amount  of  profit.  The  processing  times,  profits,  and  deadlines 
are  all  nonnegative  numbers. 

a.  State  this  problem  as  a  decision  problem. 

b.  Show  that  the  decision  problem  is  NP-complete. 

c.  Give  a  polynomial-time  algorithm  for  the  decision  problem,  assuming  that  all 
processing  times  are  integers  from  1  to  n.  (Hint:  Use  dynamic  programming.) 

d.  Give  a  polynomial-time  algorithm  for  the  optimization  problem,  assuming  that 
all  processing  times  are  integers  from  1  to  n. 


Chapter  notes 

The  book  by  Garey  and  Johnson  [  1 29]  provides  a  wonderful  guide  to  NP-complete- 
ness,  discussing  the  theory  at  length  and  providing  a  catalogue  of  many  problems 
that  were  known  to  be  NP-complete  in  1979.  The  proof  of  Theorem  34.13  is 
adapted  from  their  book,  and  the  list  of  NP-complete  problem  domains  at  the  begin¬ 
ning  of  Section  34.5  is  drawn  from  their  table  of  contents.  Johnson  wrote  a  series 
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of  23  columns  in  the  Journal  of  Algorithms  between  1981  and  1992  reporting  new 
developments  in  NP-completeness.  Hopcroft,  Motwani,  and  Ullman  [177],  Lewis 
and  Papadimitriou  [236],  Papadimitriou  [270],  and  Sipser  [317]  have  good  treat¬ 
ments  of  NP-completeness  in  the  context  of  complexity  theory.  NP-completeness 
and  several  reductions  also  appear  in  books  by  Aho,  Hopcroft,  and  Ullman  [5]; 
Dasgupta,  Papadimitriou,  and  Vazirani  [82];  Johnsonbaugh  and  Schaefer  [193]; 
and  Kleinberg  and  Tardos  [208]. 

The  class  P  was  introduced  in  1964  by  Cobham  [72]  and,  independently,  in  1965 
by  Edmonds  [100],  who  also  introduced  the  class  NP  and  conjectured  that  P  f  NP. 
The  notion  of  NP-completeness  was  proposed  in  1971  by  Cook  [75],  who  gave 
the  first  NP-completeness  proofs  for  formula  satisfiability  and  3-CNF  satisfiabil¬ 
ity.  Levin  [234]  independently  discovered  the  notion,  giving  an  NP-completeness 
proof  for  a  tiling  problem.  Karp  [199]  introduced  the  methodology  of  reductions 
in  1972  and  demonstrated  the  rich  variety  of  NP-complete  problems.  Karp’s  pa¬ 
per  included  the  original  NP-completeness  proofs  of  the  clique,  vertex-cover,  and 
hamiltonian-cycle  problems.  Since  then,  thousands  of  problems  have  been  proven 
to  be  NP-complete  by  many  researchers.  In  a  talk  at  a  meeting  celebrating  Kaip’s 
60th  birthday  in  1995,  Papadimitriou  remarked,  “about  6000  papers  each  year  have 
the  term  ‘NP-complete’  on  their  title,  abstract,  or  list  of  keywords.  This  is  more 
than  each  of  the  terms  ‘compiler,’  ‘database,’  ‘expert,’  ‘neural  network,’  or  ‘oper¬ 
ating  system.’  ” 

Recent  work  in  complexity  theory  has  shed  light  on  the  complexity  of  computing 
approximate  solutions.  This  work  gives  a  new  definition  of  NP  using  “probabilis¬ 
tically  checkable  proofs.”  This  new  definition  implies  that  for  problems  such  as 
clique,  vertex  cover,  the  traveling-salesman  problem  with  the  triangle  inequality, 
and  many  others,  computing  good  approximate  solutions  is  NP-hard  and  hence  no 
easier  than  computing  optimal  solutions.  An  introduction  to  this  area  can  be  found 
in  Arora’s  thesis  [20];  a  chapter  by  Arora  and  Lund  in  Hochbaum  [172];  a  survey 
article  by  Arora  [21];  a  book  edited  by  Mayr,  Promel,  and  Steger  [246];  and  a 
survey  article  by  Johnson  [191]. 
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Many  problems  of  practical  significance  are  NP-complete,  yet  they  are  too  impor¬ 
tant  to  abandon  merely  because  we  don’t  know  how  to  find  an  optimal  solution  in 
polynomial  time.  Even  if  a  problem  is  NP-complete,  there  may  be  hope.  We  have  at 
least  three  ways  to  get  around  NP-completeness.  First,  if  the  actual  inputs  are  small, 
an  algorithm  with  exponential  running  time  may  be  perfectly  satisfactory.  Second, 
we  may  be  able  to  isolate  important  special  cases  that  we  can  solve  in  polynomial 
time.  Third,  we  might  come  up  with  approaches  to  find  near-optimal  solutions  in 
polynomial  time  (either  in  the  worst  case  or  the  expected  case).  In  practice,  near¬ 
optimality  is  often  good  enough.  We  call  an  algorithm  that  returns  near-optimal 
solutions  an  approximation  algorithm.  This  chapter  presents  polynomial-time  ap¬ 
proximation  algorithms  for  several  NP-complete  problems. 

Performance  ratios  for  approximation  algorithms 

Suppose  that  we  are  working  on  an  optimization  problem  in  which  each  potential 
solution  has  a  positive  cost,  and  we  wish  to  find  a  near-optimal  solution.  Depending 
on  the  problem,  we  may  define  an  optimal  solution  as  one  with  maximum  possi¬ 
ble  cost  or  one  with  minimum  possible  cost;  that  is,  the  problem  may  be  either  a 
maximization  or  a  minimization  problem. 

We  say  that  an  algorithm  for  a  problem  has  an  approximation  ratio  of  pin)  if, 
for  any  input  of  size  n,  the  cost  C  of  the  solution  produced  by  the  algorithm  is 
within  a  factor  of  p(n )  of  the  cost  C*  of  an  optimal  solution: 


(35.1) 


If  an  algorithm  achieves  an  approximation  ratio  of  pin),  we  call  it  a  p(n)-approx- 
imation  algorithm.  The  definitions  of  the  approximation  ratio  and  of  a  p(n)~ 
approximation  algorithm  apply  to  both  minimization  and  maximization  problems. 
For  a  maximization  problem,  0  <  C  <  C* ,  and  the  ratio  C* / C  gives  the  factor 
by  which  the  cost  of  an  optimal  solution  is  larger  than  the  cost  of  the  approximate 
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solution.  Similarly,  for  a  minimization  problem,  0  <  C *  <  C ,  and  the  ratio  C / C* 
gives  the  factor  by  which  the  cost  of  the  approximate  solution  is  larger  than  the 
cost  of  an  optimal  solution.  Because  we  assume  that  all  solutions  have  positive 
cost,  these  ratios  are  always  well  defined.  The  approximation  ratio  of  an  approx¬ 
imation  algorithm  is  never  less  than  1,  since  C /  C*  <  1  implies  C* /  C  >  1. 
Therefore,  a  1 -approximation  algorithm1  produces  an  optimal  solution,  and  an  ap¬ 
proximation  algorithm  with  a  large  approximation  ratio  may  return  a  solution  that 
is  much  worse  than  optimal. 

For  many  problems,  we  have  polynomial-time  approximation  algorithms  with 
small  constant  approximation  ratios,  although  for  other  problems,  the  best  known 
polynomial-time  approximation  algorithms  have  approximation  ratios  that  grow 
as  functions  of  the  input  size  n .  An  example  of  such  a  problem  is  the  set-cover 
problem  presented  in  Section  35.3. 

Some  NP-complete  problems  allow  polynomial-time  approximation  algorithms 
that  can  achieve  increasingly  better  approximation  ratios  by  using  more  and  more 
computation  time.  That  is,  we  can  trade  computation  time  for  the  quality  of  the 
approximation.  An  example  is  the  subset-sum  problem  studied  in  Section  35.5. 
This  situation  is  important  enough  to  deserve  a  name  of  its  own. 

An  approximation  scheme  for  an  optimization  problem  is  an  approximation  al¬ 
gorithm  that  takes  as  input  not  only  an  instance  of  the  problem,  but  also  a  value 
e  >  0  such  that  for  any  fixed  e,  the  scheme  is  a  (1  +  ^-approximation  algorithm. 
We  say  that  an  approximation  scheme  is  a  polynomial-time  approximation  scheme 
if  for  any  fixed  e  >  0,  the  scheme  runs  in  time  polynomial  in  the  size  n  of  its  input 
instance. 

The  running  time  of  a  polynomial-time  approximation  scheme  can  increase  very 
rapidly  as  e  decreases.  For  example,  the  running  time  of  a  polynomial-time  ap¬ 
proximation  scheme  might  be  0(n2^e).  Ideally,  if  e  decreases  by  a  constant  factor, 
the  running  time  to  achieve  the  desired  approximation  should  not  increase  by  more 
than  a  constant  factor  (though  not  necessarily  the  same  constant  factor  by  which  e 
decreased). 

We  say  that  an  approximation  scheme  is  &  fully  polynomial-time  approximation 
scheme  if  it  is  an  approximation  scheme  and  its  running  time  is  polynomial  in 
both  1/e  and  the  size  n  of  the  input  instance.  For  example,  the  scheme  might  have 
arunning  time  of  0((l/e)2n3).  With  such  a  scheme,  any  constant-factor  decrease 
in  e  comes  with  a  corresponding  constant-factor  increase  in  the  running  time. 


1When  the  approximation  ratio  is  independent  of  n,  we  use  the  terms  “approximation  ratio  of  p”  and 

“p  approximation  algorithm,”  indicating  no  dependence  on  n. 
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Chapter  outline 

The  first  four  sections  of  this  chapter  present  some  examples  of  polynomial-time 
approximation  algorithms  for  NP-complete  problems,  and  the  fifth  section  presents 
a  fully  polynomial-time  approximation  scheme.  Section  35. 1  begins  with  a  study 
of  the  vertex-cover  problem,  an  NP-complete  minimization  problem  that  has  an 
approximation  algorithm  with  an  approximation  ratio  of  2.  Section  35.2  presents 
an  approximation  algorithm  with  an  approximation  ratio  of  2  for  the  case  of  the 
traveling-salesman  problem  in  which  the  cost  function  satisfies  the  triangle  in¬ 
equality.  It  also  shows  that  without  the  triangle  inequality,  for  any  constant  p  >  1 , 
a  p-approximation  algorithm  cannot  exist  unless  P  =  NP.  In  Section  35.3,  we 
show  how  to  use  a  greedy  method  as  an  effective  approximation  algorithm  for  the 
set-covering  problem,  obtaining  a  covering  whose  cost  is  at  worst  a  logarithmic 
factor  larger  than  the  optimal  cost.  Section  35.4  presents  two  more  approximation 
algorithms.  First  we  study  the  optimization  version  of  3-CNF  satisfiability  and 
give  a  simple  randomized  algorithm  that  produces  a  solution  with  an  expected  ap¬ 
proximation  ratio  of  8/7.  Then  we  examine  a  weighted  variant  of  the  vertex-cover 
problem  and  show  how  to  use  linear  programming  to  develop  a  2-approximation 
algorithm.  Finally,  Section  35.5  presents  a  fully  polynomial-time  approximation 
scheme  for  the  subset-sum  problem. 


35.1  The  vertex-cover  problem 

Section  34.5.2  defined  the  vertex-cover  problem  and  proved  it  NP-complete.  Recall 
that  a  vertex  cover  of  an  undirected  graph  G  =  (V,  E)  is  a  subset  V'  C  V  such 
that  if  ( u ,  y)  is  an  edge  of  G,  then  either  u  e  V'  or  v  e  V'  (or  both).  The  size  of  a 
vertex  cover  is  the  number  of  vertices  in  it. 

The  vertex-cover  problem  is  to  find  a  vertex  cover  of  minimum  size  in  a  given 
undirected  graph.  We  call  such  a  vertex  cover  an  optimal  vertex  cover.  This  prob¬ 
lem  is  the  optimization  version  of  an  NP-complete  decision  problem. 

Even  though  we  don’t  know  how  to  find  an  optimal  vertex  cover  in  a  graph  G 
in  polynomial  time,  we  can  efficiently  find  a  vertex  cover  that  is  near-optimal. 
The  following  approximation  algorithm  takes  as  input  an  undirected  graph  G  and 
returns  a  vertex  cover  whose  size  is  guaranteed  to  be  no  more  than  twice  the  size 
of  an  optimal  vertex  cover. 


1109 


35.1  The  vertex  cover  problem 


Figure  35.1  The  operation  of  Approx  Vertex  Cover,  (a)  The  input  graph  G,  which  has  7 
vertices  and  8  edges,  (b )  The  edge  (b,  c),  shown  heavy,  is  the  first  edge  chosen  by  Approx  Vertex 
COVER.  Vertices  b  and  c,  shown  lightly  shaded,  are  added  to  the  set  C  containing  the  vertex  cover 
being  created.  Edges  (a.  b),  (c.e).  and  ( c,d ),  shown  dashed,  are  removed  since  they  are  now  covered 
by  some  vertex  in  C.  (c)  Edge  (e,f)  is  chosen;  vertices  e  and  /  are  added  to  C.  (d)  Edge  (d,g) 
is  chosen;  vertices  d  and  g  are  added  to  C.  (e)  The  set  C,  which  is  the  vertex  cover  produced  by 
Approx  Vertex  Cover,  contains  the  six  vertices  b,  c,  d,  e,  f  g.  (f)  The  optimal  vertex  cover  for 
this  problem  contains  only  three  vertices:  b,  d,  and  e. 


Approx- Vertex-Cover  (G) 

1  C  =  0 

2  E'  =  G.E 

3  while  E'  ^  0 

4  let  (u,  v)  be  an  arbitrary  edge  of  E' 

5  C  =  C  U  {u,  v} 

6  remove  from  E'  every  edge  incident  on  either  u  or  v 

7  return  C 

Figure  35.1  illustrates  how  Approx- Vertex-Cover  operates  on  an  example 
graph.  The  variable  C  contains  the  vertex  cover  being  constructed.  Line  1  ini¬ 
tializes  C  to  the  empty  set.  Line  2  sets  E'  to  be  a  copy  of  the  edge  set  G.E  of 
the  graph.  The  loop  of  lines  3-6  repeatedly  picks  an  edge  (w,  v)  from  E' ,  adds  its 
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endpoints  u  and  v  to  C,  and  deletes  all  edges  in  E'  that  are  covered  by  either  u 
or  v.  Finally,  line  7  returns  the  vertex  cover  C .  The  running  time  of  this  algorithm 
is  0(V  +  E),  using  adjacency  lists  to  represent  E' . 

Theorem  35.1 

Approx-Vertex-Cover  is  a  polynomial-time  2-approximation  algorithm. 

Proof  We  have  already  shown  that  Approx-Vertex-Cover  runs  in  polyno¬ 
mial  time. 

The  set  C  of  vertices  that  is  returned  by  Approx-Vertex-Cover  is  a  vertex 
cover,  since  the  algorithm  loops  until  every  edge  in  G.E  has  been  covered  by  some 
vertex  in  C . 

To  see  that  Approx-Vertex-Cover  returns  a  vertex  cover  that  is  at  most  twice 
the  size  of  an  optimal  cover,  let  A  denote  the  set  of  edges  that  line  4  of  Approx - 
Vertex-Cover  picked.  In  order  to  cover  the  edges  in  A,  any  vertex  cover— in 
particular,  an  optimal  cover  C*— must  include  at  least  one  endpoint  of  each  edge 
in  A.  No  two  edges  in  A  share  an  endpoint,  since  once  an  edge  is  picked  in  line  4, 
all  other  edges  that  are  incident  on  its  endpoints  are  deleted  from  E'  in  line  6.  Thus, 
no  two  edges  in  A  are  covered  by  the  same  vertex  from  C* ,  and  we  have  the  lower 
bound 

\C*\  >  \A\  (35.2) 

on  the  size  of  an  optimal  vertex  cover.  Each  execution  of  line  4  picks  an  edge  for 
which  neither  of  its  endpoints  is  already  in  C ,  yielding  an  upper  bound  (an  exact 
upper  bound,  in  fact)  on  the  size  of  the  vertex  cover  returned: 

|C|=2|A|.  (35.3) 

Combining  equations  (35.2)  and  (35.3),  we  obtain 

|C|  =  2 \A\ 

<  2\C*\  , 

thereby  proving  the  theorem.  ■ 

Let  us  reflect  on  this  proof.  At  first,  you  might  wonder  how  we  can  possibly 
prove  that  the  size  of  the  vertex  cover  returned  by  Approx-Vertex-Cover  is  at 
most  twice  the  size  of  an  optimal  vertex  cover,  when  we  do  not  even  know  the  size 
of  an  optimal  vertex  cover.  Instead  of  requiring  that  we  know  the  exact  size  of  an 
optimal  vertex  cover,  we  rely  on  a  lower  bound  on  the  size.  As  Exercise  35.1-2  asks 
you  to  show,  the  set  A  of  edges  that  line  4  of  Approx-Vertex-Cover  selects  is 
actually  a  maximal  matching  in  the  graph  G.  (A  maximal  matching  is  a  matching 
that  is  not  a  proper  subset  of  any  other  matching.)  The  size  of  a  maximal  matching 
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is,  as  we  argued  in  the  proof  of  Theorem  35.1,  a  lower  bound  on  the  size  of  an 
optimal  vertex  cover.  The  algorithm  returns  a  vertex  cover  whose  size  is  at  most 
twice  the  size  of  the  maximal  matching  A.  By  relating  the  size  of  the  solution 
returned  to  the  lower  bound,  we  obtain  our  approximation  ratio.  We  will  use  this 
methodology  in  later  sections  as  well. 

Exercises 


35.1-1 

Give  an  example  of  a  graph  for  which  Approx-Vertex-Cover  always  yields  a 
suboptimal  solution. 


35.1- 2 

Prove  that  the  set  of  edges  picked  in  line  4  of  Approx-Vertex-Cover  forms  a 
maximal  matching  in  the  graph  G. 

35.1- 3  * 

Professor  Biindchen  proposes  the  following  heuristic  to  solve  the  vertex-cover 
problem.  Repeatedly  select  a  vertex  of  highest  degree,  and  remove  all  of  its  in¬ 
cident  edges.  Give  an  example  to  show  that  the  professor’s  heuristic  does  not  have 
an  approximation  ratio  of  2.  {Hint:  Try  a  bipartite  graph  with  vertices  of  uniform 
degree  on  the  left  and  vertices  of  varying  degree  on  the  right.) 


35.1-4 

Give  an  efficient  greedy  algorithm  that  finds  an  optimal  vertex  cover  for  a  tree  in 
lineal-  time. 


35.1-5 

From  the  proof  of  Theorem  34.12,  we  know  that  the  vertex-cover  problem  and  the 
NP-complete  clique  problem  are  complementary  in  the  sense  that  an  optimal  vertex 
cover  is  the  complement  of  a  maximum-size  clique  in  the  complement  graph.  Does 
this  relationship  imply  that  there  is  a  polynomial-time  approximation  algorithm 
with  a  constant  approximation  ratio  for  the  clique  problem?  Justify  your  answer. 


35.2  The  traveling-salesman  problem 

In  the  traveling-salesman  problem  introduced  in  Section  34.5.4,  we  are  given  a 
complete  undirected  graph  G  =  {V.  E)  that  has  a  nonnegative  integer  cost  c(u.  v) 
associated  with  each  edge  {u.v)  €  E,  and  we  must  find  a  hamiltonian  cycle  (a 
tour)  of  G  with  minimum  cost.  As  an  extension  of  our  notation,  let  c  ( A )  denote 
the  total  cost  of  the  edges  in  the  subset  A  C  E: 
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c(A )  =  ^  c(u ,  v)  . 

(u,v)eA 

In  many  practical  situations,  the  least  costly  way  to  go  from  a  place  u  to  a  place  w 
is  to  go  directly,  with  no  intermediate  steps.  Put  another  way,  cutting  out  an  inter¬ 
mediate  stop  never  increases  the  cost.  We  formalize  this  notion  by  saying  that  the 
cost  function  c  satisfies  the  triangle  inequality  if,  for  all  vertices  u,v,w  €  V, 

c(u ,  w)  <  c(u,  v)  +  c(v,  w)  . 

The  triangle  inequality  seems  as  though  it  should  naturally  hold,  and  it  is  au¬ 
tomatically  satisfied  in  several  applications.  For  example,  if  the  vertices  of  the 
graph  are  points  in  the  plane  and  the  cost  of  traveling  between  two  vertices  is  the 
ordinary  euclidean  distance  between  them,  then  the  triangle  inequality  is  satisfied. 
Furthermore,  many  cost  functions  other  than  euclidean  distance  satisfy  the  triangle 
inequality. 

As  Exercise  35.2-2  shows,  the  traveling-salesman  problem  is  NP-complete  even 
if  we  require  that  the  cost  function  satisfy  the  triangle  inequality.  Thus,  we  should 
not  expect  to  find  a  polynomial-time  algorithm  for  solving  this  problem  exactly. 
Instead,  we  look  for  good  approximation  algorithms. 

In  Section  35.2.1,  we  examine  a  2-approximation  algorithm  for  the  traveling- 
salesman  problem  with  the  triangle  inequality.  In  Section  35.2.2,  we  show  that 
without  the  triangle  inequality,  a  polynomial-time  approximation  algorithm  with  a 
constant  approximation  ratio  does  not  exist  unless  P  =  NP. 

35.2.1  The  traveling-salesman  problem  with  the  triangle  inequality 

Applying  the  methodology  of  the  previous  section,  we  shall  first  compute  a  struc¬ 
ture— a  minimum  spanning  tree— whose  weight  gives  a  lower  bound  on  the  length 
of  an  optimal  traveling-salesman  tour.  We  shall  then  use  the  minimum  spanning 
tree  to  create  a  tour  whose  cost  is  no  more  than  twice  that  of  the  minimum  spanning 
tree’s  weight,  as  long  as  the  cost  function  satisfies  the  triangle  inequality.  The  fol¬ 
lowing  algorithm  implements  this  approach,  calling  the  minimum-spanning-tree 
algorithm  MST-Prim  from  Section  23.2  as  a  subroutine.  The  parameter  G  is  a 
complete  undirected  graph,  and  the  cost  function  c  satisfies  the  triangle  inequality. 

Approx-TSP-Tour(G,  c) 

1  select  a  vertex  r  e  G.  V  to  be  a  “root”  vertex 

2  compute  a  minimum  spanning  tree  T  for  G  from  root  r 

using  MST-Prim(G,  c,  r) 

3  let  H  be  a  list  of  vertices,  ordered  according  to  when  they  are  first  visited 

in  a  preorder  tree  walk  of  T 

4  return  the  hamiltonian  cycle  H 
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Figure  35.2  The  operation  of  APPROX  TSP  TOUR,  (a)  A  complete  undirected  graph.  Vertices  lie 
on  intersections  of  integer  grid  lines.  For  example,  /  is  one  unit  to  the  right  and  two  units  up  from  h. 
The  cost  function  between  two  points  is  the  ordinary  euclidean  distance,  (b)  A  minimum  spanning 
tree  T  of  the  complete  graph,  as  computed  by  MST  PRIM.  Vertex  a  is  the  root  vertex.  Only  edges 
in  the  minimum  spanning  tree  are  shown.  The  vertices  happen  to  be  labeled  in  such  a  way  that  they 
are  added  to  the  main  tree  by  MST  PRIM  in  alphabetical  order,  (c)  A  walk  of  T ,  starting  at  a.  A 
full  walk  of  the  tree  visits  the  vertices  in  the  order  a,b,c,b,  h,b,a,d,e,  f,e,  g,e,d,a.  A  preorder 
walk  of  T  lists  a  vertex  just  when  it  is  first  encountered,  as  indicated  by  the  dot  next  to  each  vertex, 
yielding  the  ordering  a,  b,  c,h,d,  e.  /,  g.  (d)  A  tour  obtained  by  visiting  the  vertices  in  the  order 
given  by  the  preorder  walk,  which  is  the  tour  H  returned  by  Approx  TSP  TOUR.  Its  total  cost 
is  approximately  19.074.  (e)  An  optimal  tour  H*  for  the  original  complete  graph.  Its  total  cost  is 
approximately  14.715. 

Recall  from  Section  12.1  that  a  preorder  tree  walk  recursively  visits  every  vertex 
in  the  tree,  listing  a  vertex  when  it  is  first  encountered,  before  visiting  any  of  its 
children. 

Figure  35.2  illustrates  the  operation  of  Approx-TSP-Tour.  Part  (a)  of  the  fig¬ 
ure  shows  a  complete  undirected  graph,  and  part  (b)  shows  the  minimum  spanning 
tree  T  grown  from  root  vertex  a  by  MST- Prim.  Part  (c)  shows  how  a  preorder 
walk  of  T  visits  the  vertices,  and  part  (d)  displays  the  corresponding  tour,  which  is 
the  tour  returned  by  APPROX-TSP-TOUR.  Part  (e)  displays  an  optimal  tour,  which 
is  about  23%  shorter. 
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By  Exercise  23.2-2,  even  with  a  simple  implementation  of  MST-Prim,  the  run¬ 
ning  time  of  Approx-TSP-Tour  is  Q(  K2).  We  now  show  that  if  the  cost  function 
for  an  instance  of  the  traveling-salesman  problem  satisfies  the  triangle  inequality, 
then  Approx-TSP-Tour  returns  a  tour  whose  cost  is  not  more  than  twice  the  cost 
of  an  optimal  tour. 

Theorem  35.2 

Approx-TSP-Tour  is  a  polynomial-time  2-approximation  algorithm  for  the 
traveling-salesman  problem  with  the  triangle  inequality. 

Proof  We  have  already  seen  that  Approx-TSP-Tour  runs  in  polynomial  time. 

Let  H  *  denote  an  optimal  tour  for  the  given  set  of  vertices.  We  obtain  a  spanning 
tree  by  deleting  any  edge  from  a  tour,  and  each  edge  cost  is  nonnegative.  Therefore, 
the  weight  of  the  minimum  spanning  tree  T  computed  in  line  2  of  Approx-TSP- 
TOUR  provides  a  lower  bound  on  the  cost  of  an  optimal  tour: 

c(T)  <  c(H*)  .  (35.4) 

A  full  walk  of  T  lists  the  vertices  when  they  are  first  visited  and  also  whenever 
they  are  returned  to  after  a  visit  to  a  subtree.  Let  us  call  this  full  walk  W.  The  full 
walk  of  our  example  gives  the  order 

a,b,c,b,h,b,a,d,e,  fe,  g,e,d,a  . 

Since  the  full  walk  traverses  every  edge  of  T  exactly  twice,  we  have  (extending 
our  definition  of  the  cost  c  in  the  natural  manner  to  handle  multisets  of  edges) 

c(W)  =  2c(T)  .  (35.5) 

Inequality  (35.4)  and  equation  (35.5)  imply  that 

c(W)  <  2 c(H*)  ,  (35.6) 

and  so  the  cost  of  W  is  within  a  factor  of  2  of  the  cost  of  an  optimal  tour. 

Unfortunately,  the  full  walk  W  is  generally  not  a  tour,  since  it  visits  some  ver¬ 
tices  more  than  once.  By  the  triangle  inequality,  however,  we  can  delete  a  visit  to 
any  vertex  from  W  and  the  cost  does  not  increase.  (If  we  delete  a  vertex  v  from  W 
between  visits  to  u  and  w,  the  resulting  ordering  specifies  going  directly  from  u 
to  w.)  By  repeatedly  applying  this  operation,  we  can  remove  from  W  all  but  the 
first  visit  to  each  vertex.  In  our  example,  this  leaves  the  ordering 

a,  b,  c,  h,  d,  e,  fg  . 

This  ordering  is  the  same  as  that  obtained  by  a  preorder  walk  of  the  tree  T .  Let  H 
be  the  cycle  corresponding  to  this  preorder  walk.  It  is  a  hamiltonian  cycle,  since  ev- 
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ery  vertex  is  visited  exactly  once,  and  in  fact  it  is  the  cycle  computed  by  Approx  - 
TSP-Tour.  Since  H  is  obtained  by  deleting  vertices  from  the  full  walk  W,  we 
have 

c(H)  <  c(W )  .  (35.7) 

Combining  inequalities  (35.6)  and  (35.7)  gives  c(H)  <  2 c(H*),  which  completes 
the  proof.  ■ 

In  spite  of  the  nice  approximation  ratio  provided  by  Theorem  35.2,  Approx - 
TSP-Tour  is  usually  not  the  best  practical  choice  for  this  problem.  There  are  other 
approximation  algorithms  that  typically  perform  much  better  in  practice.  (See  the 
references  at  the  end  of  this  chapter.) 

35.2.2  The  general  traveling-salesman  problem 

If  we  drop  the  assumption  that  the  cost  function  c  satisfies  the  triangle  inequality, 
then  we  cannot  find  good  approximate  tours  in  polynomial  time  unless  P  =  NP. 

Theorem  35.3 

If  P  ^  NP,  then  for  any  constant  p  >  1 ,  there  is  no  polynomial-time  approximation 
algorithm  with  approximation  ratio  p  for  the  general  traveling-salesman  problem. 

Proof  The  proof  is  by  contradiction.  Suppose  to  the  contrary  that  for  some  num¬ 
ber  p  >  1,  there  is  a  polynomial-time  approximation  algorithm  A  with  approx¬ 
imation  ratio  p.  Without  loss  of  generality,  we  assume  that  p  is  an  integer,  by 
rounding  it  up  if  necessary.  We  shall  then  show  how  to  use  A  to  solve  instances 
of  the  hamiltonian-cycle  problem  (defined  in  Section  34.2)  in  polynomial  time. 
Since  Theorem  34.13  tells  us  that  the  hamiltonian-cycle  problem  is  NP-complete, 
Theorem  34.4  implies  that  if  we  can  solve  it  in  polynomial  time,  then  P  =  NP. 

Let  G  —  (V.  E)  be  an  instance  of  the  hamiltonian-cycle  problem.  We  wish  to 
determine  efficiently  whether  G  contains  a  hamiltonian  cycle  by  making  use  of 
the  hypothesized  approximation  algorithm  A.  We  turn  G  into  an  instance  of  the 
traveling-salesman  problem  as  follows.  Let  G'  =  (V.  FA)  be  the  complete  graph 
on  V ;  that  is, 

E'  =  {(n,  v)  :  u,  v  e  V  and  u  /  v}  . 

Assign  an  integer  cost  to  each  edge  in  E'  as  follows: 

1  if  (u,v)  €  E  , 

p  \  V\  +  1  otherwise  . 

We  can  create  representations  of  G'  and  c  from  a  representation  of  G  in  time  poly¬ 
nomial  in  |  V  |  and  |  E  \ . 


c(u,  v)  = 
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Now,  consider  the  traveling-salesman  problem  (G'.  c).  If  the  original  graph  G 
has  a  hamiltonian  cycle  H ,  then  the  cost  function  c  assigns  to  each  edge  of  H  a 
cost  of  1,  and  so  (G\  c )  contains  a  tour  of  cost  \  V\.  On  the  other  hand,  if  G  does 
not  contain  a  hamiltonian  cycle,  then  any  tour  of  G'  must  use  some  edge  not  in  E. 
But  any  tour  that  uses  an  edge  not  in  E  has  a  cost  of  at  least 

(p \v\  +  \)  +  (\v\-i)  =  p\v\  +  \v\ 

>  p\v\  . 

Because  edges  not  in  G  are  so  costly,  there  is  a  gap  of  at  least  p  |  V  |  between  the  cost 
of  a  tour  that  is  a  hamiltonian  cycle  in  G  (cost  \  V\)  and  the  cost  of  any  other  tour 
(cost  at  least  p  \  V\  +  \  V\).  Therefore,  the  cost  of  a  tour  that  is  not  a  hamiltonian 
cycle  in  G  is  at  least  a  factor  of  p  +  1  greater  than  the  cost  of  a  tour  that  is  a 
hamiltonian  cycle  in  G. 

Now,  suppose  that  we  apply  the  approximation  algorithm  A  to  the  traveling- 
salesman  problem  (G',c).  Because  A  is  guaranteed  to  return  a  tour  of  cost  no 
more  than  p  times  the  cost  of  an  optimal  tour,  if  G  contains  a  hamiltonian  cycle, 
then  A  must  return  it.  If  G  has  no  hamiltonian  cycle,  then  A  returns  a  tour  of  cost 
more  than  p\V\.  Therefore,  we  can  use  A  to  solve  the  hamiltonian-cycle  problem 
in  polynomial  time.  ■ 

The  proof  of  Theorem  35.3  serves  as  an  example  of  a  general  technique  for 
proving  that  we  cannot  approximate  a  problem  very  well.  Suppose  that  given  an 
NP-hard  problem  X,  we  can  produce  in  polynomial  time  a  minimization  prob¬ 
lem  Y  such  that  “yes”  instances  of  X  correspond  to  instances  of  Y  with  value  at 
most  k  (for  some  k),  but  that  “no”  instances  of  X  correspond  to  instances  of  Y 
with  value  greater  than  pk.  Then,  we  have  shown  that,  unless  P  =  NP,  there  is  no 
polynomial-time  p-approximation  algorithm  for  problem  Y . 

Exercises 


35.2-1 

Suppose  that  a  complete  undirected  graph  G  =  (F,  E)  with  at  least  3  vertices  has 
a  cost  function  c  that  satisfies  the  triangle  inequality.  Prove  that  c(u.  v  )  >  0  for  all 
u, v  g  V. 


35.2-2 

Show  how  in  polynomial  time  we  can  transform  one  instance  of  the  traveling- 
salesman  problem  into  another  instance  whose  cost  function  satisfies  the  triangle 
inequality.  The  two  instances  must  have  the  same  set  of  optimal  tours.  Explain 
why  such  a  polynomial-time  transformation  does  not  contradict  Theorem  35.3,  as¬ 
suming  that  P  ^  NP. 
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35.2-3 

Consider  the  following  closest-point  heuristic  for  building  an  approximate  trav¬ 
eling-salesman  tour  whose  cost  function  satisfies  the  triangle  inequality.  Begin 
with  a  trivial  cycle  consisting  of  a  single  arbitrarily  chosen  vertex.  At  each  step, 
identify  the  vertex  u  that  is  not  on  the  cycle  but  whose  distance  to  any  vertex  on  the 
cycle  is  minimum.  Suppose  that  the  vertex  on  the  cycle  that  is  nearest  u  is  vertex  v. 
Extend  the  cycle  to  include  u  by  inserting  u  just  after  v.  Repeat  until  all  vertices 
are  on  the  cycle.  Prove  that  this  heuristic  returns  a  tour  whose  total  cost  is  not  more 
than  twice  the  cost  of  an  optimal  tour. 


35.2-4 

In  the  bottleneck  traveling-salesman  problem ,  we  wish  to  find  the  hamiltonian  cy¬ 
cle  that  minimizes  the  cost  of  the  most  costly  edge  in  the  cycle.  Assuming  that  the 
cost  function  satisfies  the  triangle  inequality,  show  that  there  exists  a  polynomial¬ 
time  approximation  algorithm  with  approximation  ratio  3  for  this  problem.  (Hint: 
Show  recursively  that  we  can  visit  all  the  nodes  in  a  bottleneck  spanning  tree,  as 
discussed  in  Problem  23-3,  exactly  once  by  taking  a  full  walk  of  the  tree  and  skip¬ 
ping  nodes,  but  without  skipping  more  than  two  consecutive  intermediate  nodes. 
Show  that  the  costliest  edge  in  a  bottleneck  spanning  tree  has  a  cost  that  is  at  most 
the  cost  of  the  costliest  edge  in  a  bottleneck  hamiltonian  cycle.) 


35.2-5 

Suppose  that  the  vertices  for  an  instance  of  the  traveling-salesman  problem  are 
points  in  the  plane  and  that  the  cost  c(u.v)  is  the  euclidean  distance  between 
points  u  and  v.  Show  that  an  optimal  tour  never  crosses  itself. 


35.3  The  set-covering  problem 

The  set-covering  problem  is  an  optimization  problem  that  models  many  problems 
that  require  resources  to  be  allocated.  Its  corresponding  decision  problem  general¬ 
izes  the  NP-complete  vertex-cover  problem  and  is  therefore  also  NP-hard.  The  ap¬ 
proximation  algorithm  developed  to  handle  the  vertex-cover  problem  doesn’t  apply 
here,  however,  and  so  we  need  to  try  other  approaches.  We  shall  examine  a  simple 
greedy  heuristic  with  a  logarithmic  approximation  ratio.  That  is,  as  the  size  of  the 
instance  gets  larger,  the  size  of  the  approximate  solution  may  grow,  relative  to  the 
size  of  an  optimal  solution.  Because  the  logarithm  function  grows  rather  slowly, 
however,  this  approximation  algorithm  may  nonetheless  give  useful  results. 
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Figure  35.3  An  instance  ( X ,  IF)  of  the  set  covering  problem,  where  X  consists  of  the  12  black 
points  and  F  =  {Si.  S2,  S3,  S4,  S 5,  Sg}.  A  minimum  size  set  cover  is  C  =  {S3,  S4,  S5},  with 
size  3.  The  greedy  algorithm  produces  a  cover  of  size  4  by  selecting  either  the  sets  Si,  S4,  S 5, 
and  S3  or  the  sets  Si,  S4,  S5,  and  S&,  in  order. 

An  instance  ( X ,  !F)  of  the  set-covering  problem  consists  of  a  finite  set  X  and 
a  family  3~  of  subsets  of  X,  such  that  every  element  of  X  belongs  to  at  least  one 
subset  in  !F : 

*=  Us- 

We  say  that  a  subset  S  e  !F  covers  its  elements.  The  problem  is  to  find  a  minimum- 
size  subset  G  c  y  whose  members  cover  all  of  X : 

X  =  (J  5  .  (35.8) 

see 

We  say  that  any  G  satisfying  equation  (35.8)  covers  X .  Figure  35.3  illustrates  the 
set-covering  problem.  The  size  of  G  is  the  number  of  sets  it  contains,  rather  than 
the  number  of  individual  elements  in  these  sets,  since  every  subset  G  that  covers  X 
must  contain  all  \X\  individual  elements.  In  Figure  35.3,  the  minimum  set  cover 
has  size  3. 

The  set-covering  problem  abstracts  many  commonly  arising  combinatorial  prob¬ 
lems.  As  a  simple  example,  suppose  that  X  represents  a  set  of  skills  that  are  needed 
to  solve  a  problem  and  that  we  have  a  given  set  of  people  available  to  work  on  the 
problem.  We  wish  to  form  a  committee,  containing  as  few  people  as  possible, 
such  that  for  every  requisite  skill  in  A,  at  least  one  member  of  the  committee  has 
that  skill.  In  the  decision  version  of  the  set-covering  problem,  we  ask  whether  a 
covering  exists  with  size  at  most  k,  where  k  is  an  additional  parameter  specified 
in  the  problem  instance.  The  decision  version  of  the  problem  is  NP-complete,  as 
Exercise  35.3-2  asks  you  to  show. 
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A  greedy  approximation  algorithm 

The  greedy  method  works  by  picking,  at  each  stage,  the  set  S  that  covers  the  great¬ 
est  number  of  remaining  elements  that  are  uncovered. 

Greedy-Set-Cover(A,  3?) 

1  U  =  X 

2  €  =  0 

3  while  U  ±  0 

4  select  an  S  €  !F  that  maximizes  |  S  fl  U  \ 

5  U  =  U -S 

6  e  =  eu{sj 

7  return  'C 

In  the  example  of  Figure  35.3,  Greedy-Set-Cover  adds  to  '€,  in  order,  the  sets 
Si,  S4,  and  S5,  followed  by  either  S3  or  S6. 

The  algorithm  works  as  follows.  The  set  U  contains,  at  each  stage,  the  set  of 
remaining  uncovered  elements.  The  set  'C  contains  the  cover  being  constructed. 
Line  4  is  the  greedy  decision-making  step,  choosing  a  subset  S  that  covers  as  many 
uncovered  elements  as  possible  (breaking  ties  arbitrarily).  After  S  is  selected, 
line  5  removes  its  elements  from  U ,  and  line  6  places  S  into  When  the  algorithm 
terminates,  the  set  'C  contains  a  subfamily  of  !F  that  covers  X. 

We  can  easily  implement  Greedy-Set-Cover  to  run  in  time  polynomial  in  | X\ 
and  |^|.  Since  the  number  of  iterations  of  the  loop  on  lines  3-6  is  bounded  from 
above  by  min(|X|  ,  1 | ) ,  and  we  can  implement  the  loop  body  to  run  in  time 
0(\X\  |1F|),  a  simple  implementation  runs  in  time  0(\X\  \!F\  min(|A|  ,  I^D).  Ex¬ 
ercise  35.3-3  asks  for  a  linear-time  algorithm. 

Analysis 

We  now  show  that  the  greedy  algorithm  returns  a  set  cover  that  is  not  too  much 
larger  than  an  optimal  set  cover.  For  convenience,  in  this  chapter  we  denote  the  (I  th 
harmonic  number  Hj  =  i  V1'  (see  Section  A.l)  by  H(d).  As  a  boundary 
condition,  we  define  H( 0)  =  0. 

Theorem  35.4 

Greedy-Set-Cover  is  a  polynomial-time  p(n) -approximation  algorithm,  where 
p(n)  =  //(max {|S|  :Se/}). 

Proof  We  have  already  shown  that  Greedy-Set-Cover  runs  in  polynomial 
time. 
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To  show  that  Greedy-Set-Cover  is  a  p(n  (-approximation  algorithm,  we  as¬ 
sign  a  cost  of  1  to  each  set  selected  by  the  algorithm,  distribute  this  cost  over 
the  elements  covered  for  the  first  time,  and  then  use  these  costs  to  derive  the  de¬ 
sired  relationship  between  the  size  of  an  optimal  set  cover  '€*  and  the  size  of  the 
set  cover  '€  returned  by  the  algorithm.  Let  S,  denote  the  zth  subset  selected  by 
Greedy-Set-Cover;  the  algorithm  incurs  a  cost  of  1  when  it  adds  5,  to  We 
spread  this  cost  of  selecting  S,  evenly  among  the  elements  covered  for  the  first  time 
by  Si.  Let  cx  denote  the  cost  allocated  to  element  x,  for  each  x  €  X.  Each  element 
is  assigned  a  cost  only  once,  when  it  is  covered  for  the  first  time.  If  x  is  covered 
for  the  first  time  by  S, ,  then 
1 

~~  |5/-(51U52U---U5/_i)|  ' 

Each  step  of  the  algorithm  assigns  1  unit  of  cost,  and  so 

\'e\  =  J2c*-  (35.9) 

Each  element  x  e  X  is  in  at  least  one  set  in  the  optimal  cover  "€*,  and  so  we  have 
EE^E  cx  ■  (35.10) 

Set;*  xsS  xeX 

Combining  equation  (35.9)  and  inequality  (35.10),  we  have  that 

iei<  (35.  id 

See*  xeS 

The  remainder  of  the  proof  rests  on  the  following  key  inequality,  which  we  shall 
prove  shortly.  For  any  set  S  belonging  to  the  family  !F , 

£c*<tf(|S|).  (35.12) 

x€S 

From  inequalities  (35.11)  and  (35.12),  it  follows  that 

1^1  <  X!  H(-\sv 

see* 

<  je*[.^(max{|5|  :  S  e  F})  , 
thus  proving  the  theorem. 

All  that  remains  is  to  prove  inequality  (35.12).  Consider  any  set  S  €  3<  and  any 
i  =  \,2, . . .  ,\'C\,  and  let 

Ui  =  \S-(Si  U  52  U  ■■■  U  Si)\ 

be  the  number  of  elements  in  S  that  remain  uncovered  after  the  algorithm  has 
selected  sets  Si,  S2, ....  Si.  We  define  u0  =  |5|  to  be  the  number  of  elements 


35.3  The  set  covering  problem 


1121 


of  S,  which  are  all  initially  uncovered.  Let  k  be  the  least  index  such  that  u, t  =  0, 
so  that  every  element  in  S  is  covered  by  at  least  one  of  the  sets  Si,  S2,  ■  ■  ■ ,  Sk  and 
some  element  in  S  is  uncovered  by  S!  U  S2  U  ■■■  U  Sk-i.  Then,  m,-_i  >  iq,  and 
Uj—i  —  Uj  elements  of  S  are  covered  for  the  first  time  by  S,  ,  for  i  =  1,2,...,  k. 
Thus, 


Observe  that 


I  S',-  -  (sx  u  s2  u  •••  u  Si-Oi  >  |s-(s1us2u---usi_1)| 


because  the  greedy  choice  of  S,  guarantees  that  S  cannot  cover  more  new  ele¬ 
ments  than  Si  does  (otherwise,  the  algorithm  would  have  chosen  S  instead  of  S,). 
Consequently,  we  obtain 


We  now  bound  this  quantity  as  follows: 


k  Uj_  i 


(because  j  <  Ui_x) 


k 


^HiUi-x)  -  Him)) 


H(u o)  -  H{uk) 

h(mo)  -  m 0) 

H(u0) 

H(\S\), 


(because  H( 0)  =  0) 


(because  the  sum  telescopes) 


which  completes  the  proof  of  inequality  (35.12). 
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Corollary  35.5 

Greedy-Set-Cover  is  a  polynomial-time  (In  \X\  +  ^-approximation  algorithm. 

Proof  Use  inequality  (A.  14)  and  Theorem  35.4.  ■ 

In  some  applications,  max  {|5|  :  S  e  .F}  is  a  small  constant,  and  so  the  solution 
returned  by  Greedy-Set-Cover  is  at  most  a  small  constant  times  larger  than 
optimal.  One  such  application  occurs  when  this  heuristic  finds  an  approximate 
vertex  cover  for  a  graph  whose  vertices  have  degree  at  most  3.  In  this  case,  the 
solution  found  by  Greedy-Set-Cover  is  not  more  than  H( 3)  =  11/6  times  as 
large  as  an  optimal  solution,  a  performance  guarantee  that  is  slightly  better  than 
that  of  Approx-Vertex-Cover. 

Exercises 


35.3-1 

Consider  each  of  the  following  words  as  a  set  of  letters:  {arid,  dash,  drain, 
heard,  lost,  nose.  shun,  slate,  snare,  thread}.  Show  which  set  cover 
Greedy-Set-Cover  produces  when  we  break  ties  in  favor  of  the  word  that  ap¬ 
pears  first  in  the  dictionary. 


35.3-2 

Show  that  the  decision  version  of  the  set-covering  problem  is  NP-complete  by 
reducing  it  from  the  vertex-cover  problem. 


35.3- 3 

Show  how  to  implement  Greedy-Set-Cover  in  such  a  way  that  it  runs  in  time 

1-si). 

35.3- 4 

Show  that  the  following  weaker  form  of  Theorem  35.4  is  trivially  true: 

|U|  <  |U*|max{|S|  AeT}. 


35.3-5 

Greedy-Set-Cover  can  return  a  number  of  different  solutions,  depending  on 
how  we  break  ties  in  line  4.  Give  a  procedure  Bad-Set-Cover-Instance(/7) 
that  returns  an  n -element  instance  of  the  set-covering  problem  for  which,  depend¬ 
ing  on  how  we  break  ties  in  line  4,  Greedy-Set-Cover  can  return  a  number  of 
different  solutions  that  is  exponential  in  n . 
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In  this  section,  we  study  two  useful  techniques  for  designing  approximation  algo¬ 
rithms:  randomization  and  linear  programming.  We  shall  give  a  simple  randomized 
algorithm  for  an  optimization  version  of  3-CNF  satisfiability,  and  then  we  shall  use 
linear  programming  to  help  design  an  approximation  algorithm  for  a  weighted  ver¬ 
sion  of  the  vertex-cover  problem.  This  section  only  scratches  the  surface  of  these 
two  powerful  techniques.  The  chapter  notes  give  references  for  further  study  of 
these  areas. 

A  randomized  approximation  algorithm  for  MAX-3-CNF  satisfiability 

Just  as  some  randomized  algorithms  compute  exact  solutions,  some  randomized 
algorithms  compute  approximate  solutions.  We  say  that  a  randomized  algorithm 
for  a  problem  has  an  approximation  ratio  of  pin)  if,  for  any  input  of  size  n,  the 
expected  cost  C  of  the  solution  produced  by  the  randomized  algorithm  is  within  a 
factor  of  p(n)  of  the  cost  C*  of  an  optimal  solution: 


(35.13) 


We  call  a  randomized  algorithm  that  achieves  an  approximation  ratio  of  p(n)  a 
randomized  p(n) -approximation  algorithm.  In  other  words,  a  randomized  ap¬ 
proximation  algorithm  is  like  a  deterministic  approximation  algorithm,  except  that 
the  approximation  ratio  is  for  an  expected  cost. 

A  particular  instance  of  3-CNF  satisfiability,  as  defined  in  Section  34.4,  may  or 
may  not  be  satisfiable.  In  order  to  be  satisfiable,  there  must  exist  an  assignment  of 
the  variables  so  that  every  clause  evaluates  to  1 .  If  an  instance  is  not  satisfiable,  we 
may  want  to  compute  how  “close”  to  satisfiable  it  is,  that  is,  we  may  wish  to  find  an 
assignment  of  the  variables  that  satisfies  as  many  clauses  as  possible.  We  call  the 
resulting  maximization  problem  MAX-3-CNF  satisfiability.  The  input  to  MAX-3- 
CNF  satisfiability  is  the  same  as  for  3-CNF  satisfiability,  and  the  goal  is  to  return 
an  assignment  of  the  variables  that  maximizes  the  number  of  clauses  evaluating 
to  1 .  We  now  show  that  randomly  setting  each  variable  to  1  with  probability  1/2 
and  to  0  with  probability  1/2  yields  a  randomized  8/7-approximation  algorithm. 
According  to  the  definition  of  3-CNF  satisfiability  from  Section  34.4,  we  require 
each  clause  to  consist  of  exactly  three  distinct  literals.  We  further  assume  that 
no  clause  contains  both  a  variable  and  its  negation.  (Exercise  35.4-1  asks  you  to 
remove  this  last  assumption.) 
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Theorem  35.6 

Given  an  instance  of  MAX-3-CNF  satisfiability  with  n  variables  X\.x2, . . .  ,xn 
and  m  clauses,  the  randomized  algorithm  that  independently  sets  each  vari¬ 
able  to  1  with  probability  1/2  and  to  0  with  probability  1/2  is  a  randomized 
8/7-approximation  algorithm. 


Proof  Suppose  that  we  have  independently  set  each  variable  to  1  with  probabil¬ 
ity  1/2  and  to  0  with  probability  1/2.  For  i  =  1, 2, . . . ,  m,  we  define  the  indicator 
random  variable 


Yj  =  I  {clause  i  is  satisfied}  , 

so  that  Yj  =  1  as  long  as  we  have  set  at  least  one  of  the  literals  in  the  i  th  clause 
to  1.  Since  no  literal  appears  more  than  once  in  the  same  clause,  and  since  we  have 
assumed  that  no  variable  and  its  negation  appear  in  the  same  clause,  the  settings  of 
the  three  literals  in  each  clause  are  independent.  A  clause  is  not  satisfied  only  if  all 
three  of  its  literals  are  set  to  0,  and  so  Pr  {clause  i  is  not  satisfied}  =  (1/2)3  =  1/8. 
Thus,  we  have  Pr  {clause  i  is  satisfied}  =  1  —  1/8  =  7/8,  and  by  Lemma  5.1, 
we  have  E  [F(]  =  7/8.  Let  Y  be  the  number  of  satisfied  clauses  overall,  so  that 
Y  =  Yi  +  Y2  +  ■  ■  ■  +  Ym.  Then,  we  have 


E  [Y] 


E 


E* 


=  E  [Yj]  (by  linearity  of  expectation) 

/= i 

m 

=  E7/8 

i= 1 

=  7/77/8  . 

Clearly,  m  is  an  upper  bound  on  the  number  of  satisfied  clauses,  and  hence  the 
approximation  ratio  is  at  most  m/(7m/& )  =  8/7.  ■ 


Approximating  weighted  vertex  cover  using  linear  programming 

In  the  minimum-weight  vertex-cover  problem ,  we  are  given  an  undirected  graph 
G  =  (V.  E)  in  which  each  vertex  v  €  V  has  an  associated  positive  weight  w(v). 
For  any  vertex  cover  V'  c  V,  we  define  the  weight  of  the  vertex  cover  w  ( F' )  = 
12veV'  u>(v).  The  goal  is  to  find  a  vertex  cover  of  minimum  weight. 

We  cannot  apply  the  algorithm  used  for  unweighted  vertex  cover,  nor  can  we  use 
a  random  solution;  both  methods  may  return  solutions  that  are  far  from  optimal. 
We  shall,  however,  compute  a  lower  bound  on  the  weight  of  the  minimum-weight 
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vertex  cover,  by  using  a  linear  program.  We  shall  then  “round”  this  solution  and 
use  it  to  obtain  a  vertex  cover. 

Suppose  that  we  associate  a  variable  x{y)  with  each  vertex  v  e  V,  and  let  us 
require  that  x(v)  equals  either  0  or  1  for  each  v  €  V.  We  put  v  into  the  vertex  cover 
if  and  only  if  x(v)  =  1.  Then,  we  can  write  the  constraint  that  for  any  edge  ( u ,  v), 

at  least  one  of  u  and  v  must  be  in  the  vertex  cover  as  x ( u )  +  x(v)  >  1.  This  view 

gives  rise  to  the  following  0-1  integer  program  for  finding  a  minimum-weight 
vertex  cover: 

minimize  Y'.  k;(v)  x(v)  (35.14) 

v€V 

subject  to 

x(u)  +  x(v)  >  1  for  each  (u,  v)  e  E  (35.15) 

x(v)  €  {0,1}  foreachveF.  (35.16) 

In  the  special  case  in  which  all  the  weights  w(v)  are  equal  to  1,  this  formu¬ 
lation  is  the  optimization  version  of  the  NP-hard  vertex-cover  problem.  Sup¬ 
pose,  however,  that  we  remove  the  constraint  that  x(v)  e  {0, 1}  and  replace  it 
by  0  <  x(v)  <  1.  We  then  obtain  the  following  linear  program,  which  is  known  as 
the  linear-programming  relaxation-. 


minimize 

Y.  w(v)  x(v) 

(35.17) 

veV 

subject  to 

x{u)  +  x(v) 

> 

1 

for  each  (u,  v)  €  E 

(35.18) 

x(v ) 

< 

1 

for  each  v  €  V 

(35.19) 

x(v) 

> 

0 

for  each  v  e  V  . 

(35.20) 

Any  feasible  solution  to  the  0-1  integer  program  in  lines  (35. 14)— (35. 16)  is  also 
a  feasible  solution  to  the  linear  program  in  lines  (35. 17)— (35.20).  Therefore,  the 
value  of  an  optimal  solution  to  the  linear  program  gives  a  lower  bound  on  the  value 
of  an  optimal  solution  to  the  0- 1  integer  program,  and  hence  a  lower  bound  on  the 
optimal  weight  in  the  minimum-weight  vertex-cover  problem. 

The  following  procedure  uses  the  solution  to  the  linear-programming  relaxation 
to  construct  an  approximate  solution  to  the  minimum-weight  vertex-cover  problem: 
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Approx-Min-Weight-VC  (G,  w) 

1  C  =  0 

2  compute  x,  an  optimal  solution  to  the  1  i near  program  in  lines  (35.17)-(35.20) 

3  for  each  v  €  V 

4  if  x{v)  >1/2 

5  C  =  CU{v} 

6  return  C 

The  Approx-Min-Weight-VC  procedure  works  as  follows.  Line  1  initial¬ 
izes  the  vertex  cover  to  be  empty.  Line  2  formulates  the  linear  program  in 
lines  (35. 17)— (35.20)  and  then  solves  this  linear  program.  An  optimal  solution 
gives  each  vertex  v  an  associated  value  x(v),  where  0  <  x(u)  <  1.  We  use  this 
value  to  guide  the  choice  of  which  vertices  to  add  to  the  vertex  cover  C  in  lines  3-5. 
If  x(y)  >  1/2,  we  add  v  to  C;  otherwise  we  do  not.  In  effect,  we  are  “rounding” 
each  fractional  variable  in  the  solution  to  the  linear  program  to  0  or  1  in  order  to 
obtain  a  solution  to  the  0-1  integer  program  in  lines  (35.14)-(35.16).  Finally,  line  6 
returns  the  vertex  cover  C . 

Theorem  35.7 

Algorithm  Approx-Min-Weight-VC  is  a  polynomial-time  2-approximation  al¬ 
gorithm  for  the  minimum-weight  vertex-cover  problem. 

Proof  Because  there  is  a  polynomial-time  algorithm  to  solve  the  linear  program 
in  line  2,  and  because  the  for  loop  of  lines  3-5  runs  in  polynomial  time,  Approx - 
Min-Weight-VC  is  a  polynomial-time  algorithm. 

Now  we  show  that  Approx-Min-Weight-VC  is  a  2-approximation  algo¬ 
rithm.  Let  C  *  be  an  optimal  solution  to  the  minimum-weight  vertex-cover  prob¬ 
lem,  and  let  z*  be  the  value  of  an  optimal  solution  to  the  linear  program  in 
lines  (35.17)-(35.20).  Since  an  optimal  vertex  cover  is  a  feasible  solution  to  the 
lineal-  program,  z*  must  be  a  lower  bound  on  w(C*),  that  is, 

Z*  <  w(C*)  .  (35.21) 

Next,  we  claim  that  by  rounding  the  fractional  values  of  the  variables  x(y),  we 
produce  a  set  C  that  is  a  vertex  cover  and  satisfies  w(C)  <  2 z*.  To  see  that  C  is 
a  vertex  cover,  consider  any  edge  ( u ,  v)  e  E.  By  constraint  (35.18),  we  know  that 
x (u)  +  x (v)  >  1 ,  which  implies  that  at  least  one  of  x(u)  and  x(v)  is  at  least  1/2. 
Therefore,  at  least  one  of  u  and  v  is  included  in  the  vertex  cover,  and  so  every  edge 
is  covered. 

Now,  we  consider  the  weight  of  the  cover.  We  have 
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Z*  =  ^2w(v)x(v) 

veV 

>  Y  w(v)x(v) 

v€V:x(v)>l/2 

>  Y  w(y)-\ 

veV:x(v)>  1/2 

=  2>(V)4 

veC 

=  iYw(v) 

veC 

=  l-w(C )  .  (35.22) 

Combining  inequalities  (35.21)  and  (35.22)  gives 
w(C)  <  2z*  <  2 w(C*)  , 

and  hence  Approx-Min-Weight-VC  is  a  2-approximation  algorithm.  ■ 

Exercises 


35.4-1 

Show  that  even  if  we  allow  a  clause  to  contain  both  a  variable  and  its  negation,  ran¬ 
domly  setting  each  variable  to  1  with  probability  1/2  and  to  0  with  probability  1/2 
still  yields  a  randomized  8/ 7-approximation  algorithm. 


35.4-2 

The  MAX-CNF  satisfiability  problem  is  like  the  MAX-3-CNF  satisfiability  prob¬ 
lem,  except  that  it  does  not  restrict  each  clause  to  have  exactly  3  literals.  Give  a 
randomized  2-approximation  algorithm  for  the  MAX-CNF  satisfiability  problem. 


35.4-3 

In  the  MAX-CUT  problem,  we  are  given  an  unweighted  undirected  graph  G  = 
(V,  E).  We  define  a  cut  (S.  V  —  S)  as  in  Chapter  23  and  the  weight  of  a  cut  as  the 
number  of  edges  crossing  the  cut.  The  goal  is  to  find  a  cut  of  maximum  weight. 
Suppose  that  for  each  vertex  v,  we  randomly  and  independently  place  v  in  S  with 
probability  1/2  and  in  V  —  S  with  probability  1/2.  Show  that  this  algorithm  is  a 
randomized  2-approximation  algorithm. 
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35.4-4 

Show  that  the  constraints  in  line  (35.19)  are  redundant  in  the  sense  that  if  we  re¬ 
move  them  from  the  linear-  program  in  lines  (35.17)-(35.20),  any  optimal  solution 
to  the  resulting  linear-  program  must  satisfy  x(v)  <  1  for  each  v  e  V. 


35.5  The  subset-sum  problem 

Recall  from  Section  34.5.5  that  an  instance  of  the  subset-sum  problem  is  a 
pair  ( S,t ),  where  S'  is  a  set  {xi,x2, . . .  ,x„}  of  positive  integers  and  t  is  a  posi¬ 
tive  integer.  This  decision  problem  asks  whether  there  exists  a  subset  of  S  that 
adds  up  exactly  to  the  target  value  t.  As  we  saw  in  Section  34.5.5,  this  problem  is 
NP-complete. 

The  optimization  problem  associated  with  this  decision  problem  arises  in  prac¬ 
tical  applications.  In  the  optimization  problem,  we  wish  to  find  a  subset  of 
{xi,  x2, . . . ,  x„}  whose  sum  is  as  large  as  possible  but  not  larger  than  t.  For  ex¬ 
ample,  we  may  have  a  truck  that  can  carry  no  more  than  t  pounds,  and  n  different 
boxes  to  ship,  the  /  th  of  which  weighs  x,  pounds.  We  wish  to  fill  the  truck  with  as 
heavy  a  load  as  possible  without  exceeding  the  given  weight  limit. 

In  this  section,  we  present  an  exponential-time  algorithm  that  computes  the  op¬ 
timal  value  for  this  optimization  problem,  and  then  we  show  how  to  modify  the 
algorithm  so  that  it  becomes  a  fully  polynomial-time  approximation  scheme.  (Re¬ 
call  that  a  fully  polynomial-time  approximation  scheme  has  a  running  time  that  is 
polynomial  in  1/e  as  well  as  in  the  size  of  the  input.) 

An  exponential-time  exact  algorithm 

Suppose  that  we  computed,  for  each  subset  S'  of  S,  the  sum  of  the  elements 
in  S',  and  then  we  selected,  among  the  subsets  whose  sum  does  not  exceed  t, 
the  one  whose  sum  was  closest  to  t.  Clearly  this  algorithm  would  return  the  op¬ 
timal  solution,  but  it  could  take  exponential  time.  To  implement  this  algorithm, 
we  could  use  an  iterative  procedure  that,  in  iteration  i,  computes  the  sums  of 
all  subsets  of  {xi,x2, . . .  ,  x,},  using  as  a  starting  point  the  sums  of  all  subsets 
of  {xi ,  x2, . . . ,  Xi- 1 }.  In  doing  so,  we  would  realize  that  once  a  particular  subset  S' 
had  a  sum  exceeding  t,  there  would  be  no  reason  to  maintain  it,  since  no  super¬ 
set  of  S'  could  be  the  optimal  solution.  We  now  give  an  implementation  of  this 
strategy. 

The  procedure  Exact-Subset-Sum  takes  an  input  set  S  =  {xi,x2, . . .  ,  x„} 
and  a  target  value  t ;  we’ll  see  its  pseudocode  in  a  moment.  This  procedure  it- 
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eratively  computes  L,-,  the  list  of  sums  of  all  subsets  of  {x\, . . . ,  x, }  that  do  not 
exceed  t,  and  then  it  returns  the  maximum  value  in  Ln. 

If  L  is  a  list  of  positive  integers  and  x  is  another  positive  integer,  then  we  let 
L  +  x  denote  the  list  of  integers  derived  from  L  by  increasing  each  element  of  L 
by  x.  For  example,  if  L  =  (1, 2,  3,  5,  9),  then  L  +  2  =  (3, 4, 5, 7, 11).  We  also  use 
this  notation  for  sets,  so  that 

S  +  x  =  {s  +  x:ieiS)  . 

We  also  use  an  auxiliary  procedure  Merge-Lists (L,  L'),  which  returns  the 
sorted  list  that  is  the  merge  of  its  two  sorted  input  lists  L  and  L'  with  duplicate 
values  removed.  Like  the  Merge  procedure  we  used  in  merge  sort  (Section  2.3. 1), 
Merge-Lists  runs  in  time  0{\L\  +  |L'|).  We  omit  the  pseudocode  for  Merge- 
Lists. 

Exact-Subset-Sum  (S.  t ) 

1  73  =  \S\ 

2  L0  =  (0) 

3  for  f  =  l  to  n 

4  Li  =  Merge-Lists  L,_i  +*,-) 

5  remove  from  L,  every  element  that  is  greater  than  t 

6  return  the  largest  element  in  Ln 

To  see  how  Exact-Subset-Sum  works,  let  P,  denote  the  set  of  all  values 
obtained  by  selecting  a  (possibly  empty)  subset  of  {xi,  x2, . . . ,  x,}  and  summing 
its  members.  For  example,  if  S  =  {1, 4,  5},  then 

Pi  =  {0, 1}  , 

P2  =  {0,1, 4, 5}, 

P3  =  {0,1,4,5,6,9,10}  . 

Given  the  identity 

Pi  =  Pi-i  U  (P-i  +  Xi)  ,  (35.23) 

we  can  prove  by  induction  on  i  (see  Exercise  35.5-1)  that  the  list  L,  is  a  sorted  list 
containing  every  element  of  P,  whose  value  is  not  more  than  t.  Since  the  length 
of  Lj  can  be  as  much  as  2' ,  Exact-Subset-Sum  is  an  exponential-time  algorithm 
in  general,  although  it  is  a  polynomial-time  algorithm  in  the  special  cases  in  which  t 
is  polynomial  in  |  S  \  or  all  the  numbers  in  S  are  bounded  by  a  polynomial  in  |  S  \ . 

A  fully  polynomial-time  approximation  scheme 

We  can  derive  a  fully  polynomial-time  approximation  scheme  for  the  subset-sum 
problem  by  “trimming”  each  list  L,  after  it  is  created.  The  idea  behind  trimming  is 
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that  if  two  values  in  L  are  close  to  each  other,  then  since  we  want  just  an  approxi¬ 
mate  solution,  we  do  not  need  to  maintain  both  of  them  explicitly.  More  precisely, 
we  use  a  trimming  parameter  8  such  that  0  <  8  <  1.  When  we  trim  a  list  L  by  <5, 
we  remove  as  many  elements  from  L  as  possible,  in  such  a  way  that  if  L'  is  the 
result  of  trimming  L,  then  for  every  element  y  that  was  removed  from  L,  there  is 
an  element  z  still  in  L'  that  approximates  y,  that  is, 

7^-7  <  Z  <  y  •  (35.24) 

1  +  o 

We  can  think  of  such  a  z  as  “representing”  y  in  the  new  list  L' .  Each  removed 
element  y  is  represented  by  a  remaining  element  z  satisfying  inequality  (35.24). 
For  example,  if  8  =  0. 1  and 

L  =  (10, 11,  12,  15,20,21,22,23,24,29)  , 

then  we  can  trim  L  to  obtain 

L'  =  (10,12,15,20,23,29)  , 

where  the  deleted  value  11  is  represented  by  10,  the  deleted  values  21  and  22 
are  represented  by  20,  and  the  deleted  value  24  is  represented  by  23.  Because 
every  element  of  the  trimmed  version  of  the  list  is  also  an  element  of  the  original 
version  of  the  list,  trimming  can  dramatically  decrease  the  number  of  elements  kept 
while  keeping  a  close  (and  slightly  smaller)  representative  value  in  the  list  for  each 
deleted  element. 

The  following  procedure  trims  list  L  =  (jq,  y2, . . . ,  ym)  in  time  0(m),  given  L 
and  8,  and  assuming  that  L  is  sorted  into  monotonically  increasing  order.  The 
output  of  the  procedure  is  a  trimmed,  sorted  list. 

Trim(L,<$) 

1  let  m  be  the  length  of  L 

2  L'  =  (yi) 

3  last  =  yi 

4  for  i  =  2  to  m 

5  if  yi  >  last  •  (1  +  8)  //  yi  >  last  because  L  is  sorted 

6  append  yt  onto  the  end  of  L' 

7  last  =  yt 

8  return  L' 

The  procedure  scans  the  elements  of  L  in  monotonically  increasing  order.  A  num¬ 
ber  is  appended  onto  the  returned  list  L'  only  if  it  is  the  first  element  of  L  or  if  it 
cannot  be  represented  by  the  most  recent  number  placed  into  L' . 

Given  the  procedure  Trim,  we  can  construct  our  approximation  scheme  as  fol¬ 
lows.  This  procedure  takes  as  input  a  set  S  =  {x\,x2, . . . ,  x„ }  of  n  integers  (in 
arbitrary  order),  a  target  integer  f,  and  an  “approximation  parameter”  e,  where 
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0  <  e  <  1  .  (35.25) 

It  returns  a  value  z  whose  value  is  within  a  1  +  e  factor  of  the  optimal  solution. 

Approx-Subset-Sum  (S,  t,  e ) 

1  n  =  |5| 

2  L0  =  (0) 

3  for  i  =  1  to  n 

4  Li  =  Merge-Lists  L,_i  +x,) 

5  Li  =  Trim(L,,  e/2n) 

6  remove  from  L,  every  element  that  is  greater  than  t 

7  let  z*  be  the  largest  value  in  Ln 

8  return  z* 

Line  2  initializes  the  list  L0  to  be  the  list  containing  just  the  element  0.  The  for 
loop  in  lines  3-6  computes  L,  as  a  sorted  list  containing  a  suitably  trimmed  ver¬ 
sion  of  the  set  P, ,  with  all  elements  larger  than  t  removed.  Since  we  create  L, 
from  L,_] ,  we  must  ensure  that  the  repeated  trimming  doesn’t  introduce  too  much 
compounded  inaccuracy.  In  a  moment,  we  shall  see  that  Approx-Subset-Sum 
returns  a  correct  approximation  if  one  exists. 

As  an  example,  suppose  we  have  the  instance 

S'  =  (104, 102,201, 101) 

with  t  =  308  and  e  =  0.40.  The  trimming  parameter  8  is  c/8  =  0.05.  Approx  - 
Subset-Sum  computes  the  following  values  on  the  indicated  lines: 


line  2: 

L0  = 

(0), 

line  4: 

L\  = 

(0,  104)  , 

line  5: 

L\  = 

(0,  104)  , 

line  6: 

u  = 

(0,  104)  , 

line  4: 

l2  = 

(0,102,104,206)  , 

line  5: 

l2  = 

(0, 102,206) , 

line  6: 

L2  = 

(0, 102,206) , 

line  4: 

L3  = 

(0,  102,201,206,303,407)  , 

line  5: 

u  = 

(0,  102,201,303,407)  , 

line  6: 

l3  = 

(0,  102,201,303)  , 

line  4: 

l4  = 

(0, 101, 102,  201, 203,  302,  303, 404) 

line  5: 

l4  = 

(0,  101,201,302,404)  , 

line  6: 

l4  = 

(0,  101,201,302)  . 
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The  algorithm  returns  z*  =  302  as  its  answer,  which  is  well  within  e  =  40%  of 
the  optimal  answer  307  =  104  +  102  +  101;  in  fact,  it  is  within  2%. 

Theorem  35.8 

Approx-Subset-Sum  is  a  fully  polynomial-time  approximation  scheme  for  the 
subset-sum  problem. 


Proof  The  operations  of  trimming  L,  in  line  5  and  removing  from  L,  every  ele¬ 
ment  that  is  greater  than  t  maintain  the  property  that  every  element  of  L,  is  also  a 
member  of  P, .  Therefore,  the  value  z*  returned  in  line  8  is  indeed  the  sum  of  some 
subset  of  S.  Let  y*  €  Pn  denote  an  optimal  solution  to  the  subset-sum  problem. 
Then,  from  line  6,  we  know  that  z*  <  y*.  By  inequality  (35.1),  we  need  to  show 
that  y* /z*  <  1  +6.  We  must  also  show  that  the  running  time  of  this  algorithm  is 
polynomial  in  both  1  /e  and  the  size  of  the  input. 

As  Exercise  35.5-2  asks  you  to  show,  for  every  element  y  in  P,  that  is  at  most  t, 
there  exists  an  element  zefj  such  that 


V 

(1  +  e/2 n) 


- <z<y . 


(35.26) 


Inequality  (35.26)  must  hold  for  y*  e  P„,  and  therefore  there  exists  an  element 
z  e  Ln  such  that 


- - - <  z  <  y  , 

(1  +  e/2 n)n  - 

and  thus 

y*  (  e  \n 

T-('  +  5a)  •  (35'27) 

Since  there  exists  an  element  z  e  Ln  fulfilling  inequality  (35.27),  the  inequality 
must  hold  for  z*,  which  is  the  largest  value  in  L„ ;  that  is, 


r 

z* 


< 


(35.28) 


Now,  we  show  that  y*/z*  <  1  +  e.  We  do  so  by  showing  that  (1  +  e/2n)n  < 
1  +  e.  By  equation  (3.14),  we  have  lim„^00(l  +  e/2n)n  —  e€^2.  Exercise  35.5-3 
asks  you  to  show  that 


d  n  V  2  n  ' 


>  0  . 


(35.29) 


Therefore,  the  function  (1  +e/2 «)"  increases  with  n  as  it  approaches  its  limit 
of  e£/2,  and  we  have 


35.5  The  subset  sum  problem 


1133 


<  \  +  e/2  +  ( e/2 )2  (by  inequality  (3.13)) 

<  1  +  e  (by  inequality  (35.25))  .  (35.30) 


Combining  inequalities  (35.28)  and  (35.30)  completes  the  analysis  of  the  approxi¬ 
mation  ratio. 

To  show  that  Approx-Subset-Sum  is  a  fully  polynomial-time  approximation 
scheme,  we  derive  a  bound  on  the  length  of  L,  .  After  trimming,  successive  ele¬ 
ments  z  and  z!  of  Lj  must  have  the  relationship  z'/z  >  I  +e/2n.  That  is,  they  must 
differ  by  a  factor  of  at  least  1  +  e/2/7.  Each  list,  therefore,  contains  the  value  0, 
possibly  the  value  1,  and  up  to  \}ogl+e/2n  additional  values.  The  number  of 
elements  in  each  list  L,  is  at  most 


l0gl+e/2»?  +  2  = 


< 


< 


Inf 

ln(l  +  e/2//)  "I" 
2//(l  +  e/2n)  Inf 
e 

3/7  In  f 

- +  2 

e 


(by  inequality  (3.17)) 
(by  inequality  (35.25))  . 


This  bound  is  polynomial  in  the  size  of  the  input— which  is  the  number  of  bits  lg  f 
needed  to  represent  f  plus  the  number  of  bits  needed  to  represent  the  set  S,  which  is 
in  turn  polynomial  in  n  —  and  in  1/e.  Since  the  running  time  of  Approx -Subset- 
Sum  is  polynomial  in  the  lengths  of  the  L,,  we  conclude  that  Approx-Subset- 
Sum  is  a  fully  polynomial-time  approximation  scheme.  ■ 


Exercises 


35.5-1 

Prove  equation  (35.23).  Then  show  that  after  executing  line  5  of  Exact-Subset- 
Sum,  L[  is  a  sorted  list  containing  every  element  of  P,  whose  value  is  not  more 
than  f. 


35.5-2 

Using  induction  on  /,  prove  inequality  (35.26). 


35.5-3 

Prove  inequality  (35.29). 
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35.5-4 

How  would  you  modify  the  approximation  scheme  presented  in  this  section  to  find 
a  good  approximation  to  the  smallest  value  not  less  than  t  that  is  a  sum  of  some 
subset  of  the  given  input  list? 


35.5-5 

Modify  the  Approx-Subset-Sum  procedure  to  also  return  the  subset  of  S  that 
sums  to  the  value  z*. 


Problems 


35-1  Bin  packing 

Suppose  that  we  are  given  a  set  of  n  objects,  where  the  size  s,  of  the  /'  th  object 
satisfies  0  <  st  <  1.  We  wish  to  pack  all  the  objects  into  the  minimum  number  of 
unit-size  bins.  Each  bin  can  hold  any  subset  of  the  objects  whose  total  size  does 
not  exceed  1 . 

a.  Prove  that  the  problem  of  determining  the  minimum  number  of  bins  required  is 
NP-hard.  {Hint:  Reduce  from  the  subset-sum  problem.) 

The  first-fit  heuristic  takes  each  object  in  turn  and  places  it  into  the  first  bin  that 
can  accommodate  it.  Let  S  =  YH=i  si • 

b.  Argue  that  the  optimal  number  of  bins  required  is  at  least  |" 5] . 

c.  Argue  that  the  first-fit  heuristic  leaves  at  most  one  bin  less  than  half  full. 

d.  Prove  that  the  number  of  bins  used  by  the  first-fit  heuristic  is  never  more 
than  [25]. 

e.  Prove  an  approximation  ratio  of  2  for  the  first-fit  heuristic. 

/.  Give  an  efficient  implementation  of  the  first-fit  heuristic,  and  analyze  its  running 
time. 

35-2  Approximating  the  size  of  a  maximum  clique 

Let  G  =  (V,  E)  be  an  undirected  graph.  Lor  any  k  >  1,  define  G <k  ’  to  be  the  undi¬ 

rected  graph  (V(k) .  Eik)),  where  V(-* a. b. c. d. e. * * * * * k)  is  the  set  of  all  ordered  k-tuples  of  vertices 

from  V  and  E^k)  is  defined  so  that  (vi,  v2, . . . ,  vf)  is  adjacent  to  (w\,  w2, . . . ,  Wk) 

if  and  only  if  for  i  =  1,2, ...  ,k,  either  vertex  u,  is  adjacent  to  in,  in  G,  or  else 
Vi  =  Wj. 
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a.  Prove  that  the  size  of  the  maximum  clique  in  G<k  }  is  equal  to  the  /cth  power  of 
the  size  of  the  maximum  clique  in  G. 

b.  Argue  that  if  there  is  an  approximation  algorithm  that  has  a  constant  approxi¬ 
mation  ratio  for  finding  a  maximum-size  clique,  then  there  is  a  polynomial-time 
approximation  scheme  for  the  problem. 

35-3  Weighted  set-covering  problem 

Suppose  that  we  generalize  the  set-covering  problem  so  that  each  set  Sj  in  the 
family  E  has  an  associated  weight  Wj  and  the  weight  of  a  cover  '€  is  */■ 

We  wish  to  determine  a  minimum-weight  cover.  (Section  35.3  handles  the  case  in 
which  Wj  =  1  for  all  i.) 

Show  how  to  generalize  the  greedy  set-covering  heuristic  in  a  natural  manner 
to  provide  an  approximate  solution  for  any  instance  of  the  weighted  set-covering 
problem.  Show  that  your  heuristic  has  an  approximation  ratio  of  H(d),  where  d  is 
the  maximum  size  of  any  set  5, . 

35-4  Maximum  matching 

Recall  that  for  an  undirected  graph  G,  a  matching  is  a  set  of  edges  such  that  no 
two  edges  in  the  set  are  incident  on  the  same  vertex.  In  Section  26.3,  we  saw  how 
to  find  a  maximum  matching  in  a  bipartite  graph.  In  this  problem,  we  will  look  at 
matchings  in  undirected  graphs  in  general  (i.e.,  the  graphs  are  not  required  to  be 
bipartite). 

a.  A  maximal  matching  is  a  matching  that  is  not  a  proper  subset  of  any  other 
matching.  Show  that  a  maximal  matching  need  not  be  a  maximum  matching  by 
exhibiting  an  undirected  graph  G  and  a  maximal  matching  M  in  G  that  is  not  a 
maximum  matching.  (Hint:  You  can  find  such  a  graph  with  only  four  vertices.) 

b.  Consider  an  undirected  graph  G  =  (V,  E).  Give  an  0(A)-timc  greedy  algo¬ 
rithm  to  find  a  maximal  matching  in  G. 

In  this  problem,  we  shall  concentrate  on  a  polynomial-time  approximation  algo¬ 
rithm  for  maximum  matching.  Whereas  the  fastest  known  algorithm  for  maximum 
matching  takes  superlinear  (but  polynomial)  time,  the  approximation  algorithm 
here  will  run  in  linear  time.  You  will  show  that  the  linear-time  greedy  algorithm 
for  maximal  matching  in  part  (b)  is  a  2-approximation  algorithm  for  maximum 
matching. 

c.  Show  that  the  size  of  a  maximum  matching  in  G  is  a  lower  bound  on  the  size 
of  any  vertex  cover  for  G. 
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d.  Consider  a  maximal  matching  M  in  G  =  (V,  E).  Let 
T  =  {v  €  V  :  some  edge  in  M  is  incident  on  v}  . 

What  can  you  say  about  the  subgraph  of  G  induced  by  the  vertices  of  G  that 
are  not  in  T1 

e.  Conclude  from  part  (d)  that  2  \  M  |  is  the  size  of  a  vertex  cover  for  G. 

f.  Using  parts  (c)  and  (e),  prove  that  the  greedy  algorithm  in  part  (b)  is  a  2-approx- 
imation  algorithm  for  maximum  matching. 

35-5  Parallel  machine  scheduling 

In  the  parallel-machine-scheduling  problem ,  we  are  given  n  jobs,  Jx,  J2, . . . ,  Jn, 
where  each  job  Jg  has  an  associated  nonnegative  processing  time  of  pg.  We  are 
also  given  m  identical  machines,  Mx,  M2, ....  Mm.  Any  job  can  run  on  any  ma¬ 
chine.  A  schedule  specifies,  for  each  job  Jg,  the  machine  on  which  it  runs  and 
the  time  period  during  which  it  runs.  Each  job  Jg  must  run  on  some  machine  M, 
for  pg  consecutive  time  units,  and  during  that  time  period  no  other  job  may  run 
on  Mj.  Let  Cg  denote  the  completion  time  of  job  Jg,  that  is,  the  time  at  which 
job  Jg  completes  processing.  Given  a  schedule,  we  define  Cmax  =  max  |<7  <„  Cj  to 
be  the  makespan  of  the  schedule.  The  goal  is  to  find  a  schedule  whose  makespan 
is  minimum. 

For  example,  suppose  that  we  have  two  machines  Mj  and  M2  and  that  we  have 
four  jobs  Ji,  J2,  /3,  J with  px  =  2,  p2  =  12,  p3  =  4,  and  p4  =  5.  Then  one 
possible  schedule  runs,  on  machine  Mx,  job  ./,  followed  by  job  J2,  and  on  ma¬ 
chine  M2,  it  runs  job  /4  followed  by  job  /3.  For  this  schedule,  Cj  =  2,  C2  =  14, 
C3  =  9,  C4  =  5,  and  Cmax  =  14.  An  optimal  schedule  runs  J2  on  machine  Mx,  and 
it  runs  jobs  Jx,  ./3 ,  and  ,/4  on  machine  M2.  For  this  schedule,  Cj  =  2,  C2  =  12, 
C3  =  6,  C4  =  11,  and  Cmax  =  12. 

Given  a  parallel-machine-scheduling  problem,  we  let  Cjjax  denote  the  makespan 
of  an  optimal  schedule. 

a.  Show  that  the  optimal  makespan  is  at  least  as  large  as  the  greatest  processing 
time,  that  is, 


b.  Show  that  the  optimal  makespan  is  at  least  as  large  as  the  average  machine  load, 


that  is, 
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Suppose  that  we  use  the  following  greedy  algorithm  for  parallel  machine  schedul¬ 
ing:  whenever  a  machine  is  idle,  schedule  any  job  that  has  not  yet  been  scheduled. 

c.  Write  pseudocode  to  implement  this  greedy  algorithm.  What  is  the  running 
time  of  your  algorithm? 

d.  For  the  schedule  returned  by  the  greedy  algorithm,  show  that 
Cmax  <  —  pk  +  max  pk  . 

m  l<k<n 

l<k<n 

Conclude  that  this  algorithm  is  a  polynomial-time  2-approximation  algorithm. 
35-6  Approximating  a  maximum  spanning  tree 

Let  G  =  (V,  E)  be  an  undirected  graph  with  distinct  edge  weights  w(u,  v)  on  each 
edge  (n,  v)  €  E.  For  each  vertex  v  €  V,  let  max(v)  =  maX(U;V)6£  {w(u,  v) }  be 
the  maximum-weight  edge  incident  on  that  vertex.  Let  So  =  {max(v)  :  v  e  V} 
be  the  set  of  maximum-weight  edges  incident  on  each  vertex,  and  let  TG  be  the 
maximum- weight  spanning  tree  of  G,  that  is,  the  spanning  tree  of  maximum  total 
weight.  For  any  subset  of  edges  E'  C  E,  define  w(E')  =  v)g£,  w(u,  v). 

a.  Give  an  example  of  a  graph  with  at  least  4  vertices  for  which  SG  =  TG. 

b.  Give  an  example  of  a  graph  with  at  least  4  vertices  for  which  So  f  TG. 

c.  Prove  that  SG  C  TG  for  any  graph  G. 

d.  Prove  that  w(TG )  >  w (SG ) / 2  for  any  graph  G. 

e.  Give  an  0(V  +  /s)-time  algorithm  to  compute  a  2-approximation  to  the  maxi¬ 
mum  spanning  tree. 

35-7  An  approximation  algorithm  for  the  0-1  knapsack  problem 
Recall  the  knapsack  problem  from  Section  16.2.  There  are  n  items,  where  the  zth 
item  is  worth  v,  dollars  and  weighs  w,  pounds.  We  are  also  given  a  knapsack 
that  can  hold  at  most  W  pounds.  Here,  we  add  the  further  assumptions  that  each 
weight  Wj  is  at  most  W  and  that  the  items  are  indexed  in  monotonically  decreasing 
order  of  their  values:  vx  >  v2  >  •  ■  ■  >  vn. 

In  the  0-1  knapsack  problem,  we  wish  to  find  a  subset  of  the  items  whose  total 
weight  is  at  most  W  and  whose  total  value  is  maximum.  The  fractional  knapsack 
problem  is  like  the  0-1  knapsack  problem,  except  that  we  are  allowed  to  take  a 
fraction  of  each  item,  rather  than  being  restricted  to  taking  either  all  or  none  of 
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each  item.  If  we  take  a  fraction  x,  of  item  i ,  where  0  <  x,  <  1 ,  we  contribute 
x,  Wj  to  the  weight  of  the  knapsack  and  receive  value  x,  v,- .  Our  goal  is  to  develop 
a  polynomial-time  2-approximation  algorithm  for  the  0-1  knapsack  problem. 

In  order  to  design  a  polynomial-time  algorithm,  we  consider  restricted  instances 
of  the  0-1  knapsack  problem.  Given  an  instance  I  of  the  knapsack  problem,  we 
form  restricted  instances  Ij,  for  j  =  1, 2, . . . ,  n,  by  removing  items  1, 2, . . . ,  j  —  1 
and  requiring  the  solution  to  include  item  j  (all  of  item  j  in  both  the  fractional 
and  0-1  knapsack  problems).  No  items  are  removed  in  instance  /, .  For  instance  /,  , 
let  Pj  denote  an  optimal  solution  to  the  0-1  problem  and  Qj  denote  an  optimal 
solution  to  the  fractional  problem. 

a.  Argue  that  an  optimal  solution  to  instance  /  of  the  0- 1  knapsack  problem  is  one 

of  {Pi,  Pi,  •  •  • ,  Pn}- 

b.  Prove  that  we  can  find  an  optimal  solution  Qj  to  the  fractional  problem  for  in¬ 
stance  Ij  by  including  item  j  and  then  using  the  greedy  algorithm  in  which 
at  each  step,  we  take  as  much  as  possible  of  the  unchosen  item  in  the  set 
{j  +  l,  j  +  2 , . . . ,  n }  with  maximum  value  per  pound  v, / in, . 

c.  Prove  that  we  can  always  construct  an  optimal  solution  Qj  to  the  fractional 
problem  for  instance  Ij  that  includes  at  most  one  item  fractionally.  That  is,  for 
all  items  except  possibly  one,  we  either  include  all  of  the  item  or  none  of  the 
item  in  the  knapsack. 

d.  Given  an  optimal  solution  Qj  to  the  fractional  problem  for  instance  Ij,  form 
solution  Rj  from  Qj  by  deleting  any  fractional  items  from  Qj.  Let  v(S)  denote 
the  total  value  of  items  taken  in  a  solution  S.  Prove  that  v(Rj)  >  v(Q j)/ 2  > 
v(Pj)/2. 

e.  Give  a  polynomial-time  algorithm  that  returns  a  maximum-value  solution  from 
the  set  {/?!,  R2, . . . ,  R„},  and  prove  that  your  algorithm  is  a  polynomial -time 
2-approximation  algorithm  for  the  0- 1  knapsack  problem. 


Chapter  notes 

Although  methods  that  do  not  necessarily  compute  exact  solutions  have  been 
known  for  thousands  of  years  (for  example,  methods  to  approximate  the  value 
of  n),  the  notion  of  an  approximation  algorithm  is  much  more  recent.  Hochbaum 
[172]  credits  Garey,  Graham,  and  Ullman  [128]  and  Johnson  [190]  with  formal¬ 
izing  the  concept  of  a  polynomial-time  approximation  algorithm.  The  first  such 
algorithm  is  often  credited  to  Graham  [149]. 
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Since  this  early  work,  thousands  of  approximation  algorithms  have  been  de¬ 
signed  for  a  wide  range  of  problems,  and  there  is  a  wealth  of  literature  on  this 
field.  Recent  texts  by  Ausiello  et  al.  [26],  Hochbaum  [172],  and  Vazirani  [345] 
deal  exclusively  with  approximation  algorithms,  as  do  surveys  by  Shmoys  [315] 
and  Klein  and  Young  [207].  Several  other  texts,  such  as  Garey  and  Johnson  [129] 
and  Papadimitriou  and  Steiglitz  [271],  have  significant  coverage  of  approximation 
algorithms  as  well.  Lawler,  Lenstra,  Rinnooy  Kan,  and  Shmoys  [225]  provide  an 
extensive  treatment  of  approximation  algorithms  for  the  traveling-salesman  prob¬ 
lem. 

Papadimitriou  and  Steiglitz  attribute  the  algorithm  Approx- Vertex-Cover 
to  F.  Gavril  and  M.  Yannakakis.  The  vertex-cover  problem  has  been  studied  exten¬ 
sively  (Hochbaum  [172]  lists  16  different  approximation  algorithms  for  this  prob¬ 
lem),  but  all  the  approximation  ratios  are  at  least  2  —  o(l). 

The  algorithm  Approx-TSP-Tour  appears  in  a  paper  by  Rosenkrantz,  Stearns, 
and  Lewis  [298].  Christofides  improved  on  this  algorithm  and  gave  a  3/2-approx- 
imation  algorithm  for  the  traveling-salesman  problem  with  the  triangle  inequality. 
Arora  [22]  and  Mitchell  [257]  have  shown  that  if  the  points  are  in  the  euclidean 
plane,  there  is  a  polynomial-time  approximation  scheme.  Theorem  35.3  is  due  to 
Sahni  and  Gonzalez  [301]. 

The  analysis  of  the  greedy  heuristic  for  the  set-covering  problem  is  modeled 
after  the  proof  published  by  Chvatal  [68]  of  a  more  general  result;  the  basic  result 
as  presented  here  is  due  to  Johnson  [190]  and  Lovasz  [238]. 

The  algorithm  Approx -Subset- Sum  and  its  analysis  are  loosely  modeled  after 
related  approximation  algorithms  for  the  knapsack  and  subset-sum  problems  by 
Ibarra  and  Kim  [187], 

Problem  35-7  is  a  combinatorial  version  of  a  more  general  result  on  approximat¬ 
ing  knapsack-type  integer  programs  by  Bienstock  and  McClosky  [45]. 

The  randomized  algorithm  for  MAX-3-CNF  satisfiability  is  implicit  in  the  work 
of  Johnson  [190].  The  weighted  vertex-cover  algorithm  is  by  Hochbaum  [171]. 
Section  35.4  only  touches  on  the  power  of  randomization  and  linear  program¬ 
ming  in  the  design  of  approximation  algorithms.  A  combination  of  these  two  ideas 
yields  a  technique  called  “randomized  rounding,”  which  formulates  a  problem  as 
an  integer  linear  program,  solves  the  linear-programming  relaxation,  and  interprets 
the  variables  in  the  solution  as  probabilities.  These  probabilities  then  help  guide 
the  solution  of  the  original  problem.  This  technique  was  first  used  by  Raghavan 
and  Thompson  [290],  and  it  has  had  many  subsequent  uses.  (See  Motwani,  Naor, 
and  Raghavan  [261]  for  a  survey.)  Several  other  notable  recent  ideas  in  the  field 
of  approximation  algorithms  include  the  primal-dual  method  (see  Goemans  and 
Williamson  [135]  for  a  survey),  finding  sparse  cuts  for  use  in  divide-and-conquer 
algorithms  [229],  and  the  use  of  semidefinite  programming  [134]. 
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As  mentioned  in  the  chapter  notes  for  Chapter  34,  recent  results  in  probabilisti¬ 
cally  checkable  proofs  have  led  to  lower  bounds  on  the  approximability  of  many 
problems,  including  several  in  this  chapter.  In  addition  to  the  references  there, 
the  chapter  by  Arora  and  Lund  [23]  contains  a  good  description  of  the  relation¬ 
ship  between  probabilistically  checkable  proofs  and  the  hardness  of  approximating 
various  problems. 


VIII  Appendix:  Mathematical  Background 


Introduction 


When  we  analyze  algorithms,  we  often  need  to  draw  upon  a  body  of  mathematical 
tools.  Some  of  these  tools  are  as  simple  as  high-school  algebra,  but  others  may  be 
new  to  you.  In  Paid  I,  we  saw  how  to  manipulate  asymptotic  notations  and  solve 
recurrences.  This  appendix  comprises  a  compendium  of  several  other  concepts  and 
methods  we  use  to  analyze  algorithms.  As  noted  in  the  introduction  to  Part  I,  you 
may  have  seen  much  of  the  material  in  this  appendix  before  having  read  this  book 
(although  the  specific  notational  conventions  we  use  might  occasionally  differ  from 
those  you  have  seen  elsewhere).  Hence,  you  should  treat  this  appendix  as  reference 
material.  As  in  the  rest  of  this  book,  however,  we  have  included  exercises  and 
problems,  in  order  for  you  to  improve  your  skills  in  these  areas. 

Appendix  A  offers  methods  for  evaluating  and  bounding  summations,  which 
occur  frequently  in  the  analysis  of  algorithms.  Many  of  the  formulas  here  appear 
in  any  calculus  text,  but  you  will  find  it  convenient  to  have  these  methods  compiled 
in  one  place. 

Appendix  B  contains  basic  definitions  and  notations  for  sets,  relations,  functions, 
graphs,  and  trees.  It  also  gives  some  basic  properties  of  these  mathematical  objects. 

Appendix  C  begins  with  elementary  principles  of  counting:  permutations,  com¬ 
binations,  and  the  like.  The  remainder  contains  definitions  and  properties  of  basic 
probability.  Most  of  the  algorithms  in  this  book  require  no  probability  for  their 
analysis,  and  thus  you  can  easily  omit  the  latter  sections  of  the  chapter  on  a  first 
reading,  even  without  skimming  them.  Later,  when  you  encounter  a  probabilistic 
analysis  that  you  want  to  understand  better,  you  will  find  Appendix  C  well  orga¬ 
nized  for  reference  purposes. 
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Appendix  D  defines  matrices,  their  operations,  and  some  of  their  basic  prop¬ 
erties.  You  have  probably  seen  most  of  this  material  already  if  you  have  taken  a 
course  in  linear  algebra,  but  you  might  find  it  helpful  to  have  one  place  to  look  for 
our  notation  and  definitions. 


A 


Summations 


When  an  algorithm  contains  an  iterative  control  construct  such  as  a  while  or  for 
loop,  we  can  express  its  running  time  as  the  sum  of  the  times  spent  on  each  exe¬ 
cution  of  the  body  of  the  loop.  For  example,  we  found  in  Section  2.2  that  the  j  th 
iteration  of  insertion  sort  took  time  proportional  to  j  in  the  worst  case.  By  adding 
up  the  time  spent  on  each  iteration,  we  obtained  the  summation  (or  series) 


n 


When  we  evaluated  this  summation,  we  attained  a  bound  of  &(n2)  on  the  worst- 
case  running  time  of  the  algorithm.  This  example  illustrates  why  you  should  know 
how  to  manipulate  and  bound  summations. 

Section  A.l  lists  several  basic  formulas  involving  summations.  Section  A.2  of¬ 
fers  useful  techniques  for  bounding  summations.  We  present  the  formulas  in  Sec¬ 
tion  A.  1  without  proof,  though  proofs  for  some  of  them  appear  in  Section  A.2  to 
illustrate  the  methods  of  that  section.  You  can  find  most  of  the  other  proofs  in  any 
calculus  text. 


A.l  Summation  formulas  and  properties 


Given  a  sequence  a.\,a2,  ■  ■  ■  ,an  of  numbers,  where  n  is  a  nonnegative  integer,  we 
can  write  the  finite  sum  a  i  +  a2  +  ■  ■  ■  +  a„  as 


n 


k= 1 

If  n  =  0,  the  value  of  the  summation  is  defined  to  be  0.  The  value  of  a  finite  series 
is  always  well  defined,  and  we  can  add  its  terms  in  any  order. 

Given  an  infinite  sequence  alf  a2, . . .  of  numbers,  we  can  write  the  infinite  sum 
o  i  -)-  ci  2  -j-  ■  ■  ■  as 
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oo 


which  we  interpret  to  mean 


n 


lim  }  au  . 


If  the  limit  does  not  exist,  the  series  diverges ;  otherwise,  it  converges.  The  terms 
of  a  convergent  series  cannot  always  be  added  in  any  order.  We  can,  however, 
rearrange  the  terms  of  an  absolutely  convergent  series,  that  is,  a  series  ak 

for  which  the  series  \ak\  also  converges. 


Linearity 


For  any  real  number  c  and  any  finite  sequences  a\,  a2,  ■  ■  ■ ,  an  and  b\,  b2, . . . ,  bn, 


n 


n 


n 


The  linearity  property  also  applies  to  infinite  convergent  series. 

We  can  exploit  the  linearity  property  to  manipulate  summations  incorporating 
asymptotic  notation.  For  example, 


In  this  equation,  the  0-notation  on  the  left-hand  side  applies  to  the  variable  k,  but 
on  the  right-hand  side,  it  applies  to  n.  We  can  also  apply  such  manipulations  to 
infinite  convergent  series. 

Arithmetic  series 

The  summation 


n 


is  an  arithmetic  series  and  has  the  value 


(A.l) 


k= 1 


@(n2)  . 


(A.2) 


A.l  Summation  formulas  and  properties 
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Sums  of  squares  and  cubes 

We  have  the  following  summations  of  squares  and  cubes: 
n{n  +  1 )  (2/2  +  1) 


£‘2 


n 


i>3 


n2(n  +  l)2 
4 


(A.3) 

(A.4) 


Geometric  series 

For  real  x  ^  1 ,  the  summation 


=  1  +  x  +  x2  4 - F  x" 


k—0 


is  a  geometric  or  exponential  series  and  has  the  value 


£ 

k= 0 


+ 1 


l 


X 


1 


(A.5) 


When  the  summation  is  infinite  and  |x|  <  1,  we  have  the  infinite  decreasing  geo¬ 
metric  series 

1 


£ 

k= 0 


X 


1  —  X 


(A.6) 


Harmonic  series 

For  positive  integers  n,  the  nth  harmonic  number  is 
Hn 


111  1 

1  +  —  +  —  +  7  +  •••-( - 

2  3  4  n 


\ 

=  £r 

k=  1 

=  Inn  +  (9(1)  . 

(We  shall  prove  a  related  bound  in  Section  A.2.) 


(A.7) 


Integrating  and  differentiating  series 

By  integrating  or  differentiating  the  formulas  above,  additional  formulas  arise.  For 
example,  by  differentiating  both  sides  of  the  infinite  geometric  series  (A.6)  and 
multiplying  by  x,  we  get 
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Y2kxk 

k= 0 


X 

0  -x)2 


for  lx I  <  1. 


(A.8) 


Telescoping  series 


For  any  sequence  ao,  a.\, . . . ,  an, 

n 

ak  -  a k — i )  =  an  -  a0  ,  (A.9) 

A:=l 

since  each  of  the  terms  ai,a2, ... ,  cin-\  is  added  in  exactly  once  and  subtracted  out 
exactly  once.  We  say  that  the  sum  telescopes.  Similarly, 

n- 1 

^2(ak  -  ak+1 )  =  a0  -  an  . 

k= 0 

As  an  example  of  a  telescoping  sum,  consider  the  series 


n— 1 


E 


1 

k(k  +  1) 


Since  we  can  rewrite  each  term  as 

1  _  1  1 

k(k  +  1)  k  k  +  1 
we  get 


n— 1 


E 


1 

k(k  +  1) 


k  +  1 


Products 

We  can  write  the  finite  product  aia2---a„  as 


~[ak 

k=  1 

If  n  =  0,  the  value  of  the  product  is  defined  to  be  1 .  We  can  convert  a  formula  with 
a  product  to  a  formula  with  a  summation  by  using  the  identity 

ig  ( n ak ) = i§  ak  ■ 

\k=i  /  k=i 


A.2  Bounding  summations 
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Exercises 
A.  1-1 

Find  a  simple  formula  for  Ylk= 1  (2/r  —  1). 

A.l-2  * 

Show  that  Ylk=i  1  / (2/c  —  1)  =  \n(^/n)  +  0(1)  by  manipulating  the  harmonic 
series. 

A.  1-3 

Show  that  YlkL o  k2xk  =  x(l  +  x)/(l  —  x)3  for  0  <  |x|  <  1. 

A.l-4  * 

Show  that  £*„(*  —  1)/ 2*  =  0. 

A.l-5  * 

Evaluate  the  sum  YlkL i (2Ar  +  l)x2fc. 

A.  1-6 

Prove  that  ^T£=1  O(fk(0)  =  0(  fkd))  by  using  the  linearity  property  of 
summations. 

A.l-7 

Evaluate  the  product  n*=i  2  ■  4k . 

A.  1-8  * 

Evaluate  the  product  ]~[k=2(l  —  l/^2)- 


A.2  Bounding  summations 

We  have  many  techniques  at  our  disposal  for  bounding  the  summations  that  de¬ 
scribe  the  running  times  of  algorithms.  Here  are  some  of  the  most  frequently  used 
methods. 

Mathematical  induction 

The  most  basic  way  to  evaluate  a  series  is  to  use  mathematical  induction.  As  an 
example,  let  us  prove  that  the  arithmetic  series  Ylk=i  k  evaluates  to  \n(n  +  1).  We 
can  easily  verify  this  assertion  for  n  =  1 .  We  make  the  inductive  assumption  that 
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it  holds  for  n,  and  we  prove  that  it  holds  for  n  +  1.  We  have 

n  + 1  n 

+  (n  +  1) 

k=  1  k=  1 

=  ^«(«  +  1)  +  (n  +  1) 

=  ^(«  +  l)(«+2). 

You  don’t  always  need  to  guess  the  exact  value  of  a  summation  in  order  to  use 
mathematical  induction.  Instead,  you  can  use  induction  to  prove  a  bound  on  a  sum¬ 
mation.  As  an  example,  let  us  prove  that  the  geometric  series  3*  is  0(3"). 
More  specifically,  let  us  prove  that  Ylk= o  3*  <  c 3"  for  some  constant  c.  For  the 
initial  condition  n  =  0,  we  have  Yl°k=o  3k  =  1  <  c  ■  1  as  long  as  c  >  1.  Assuming 
that  the  bound  holds  for  n ,  let  us  prove  that  it  holds  for  n  +  1 .  We  have 


n  + 1 

E* 

k= 0 


=  J]3^  +  3n+1 

k= 0 

c3"  +  3"+1 


(by  the  inductive  hypothesis) 


G+G 


c  3 


«+ i 


c3 


«  +  l 


as  long  as  (1/3  +  1/c)  <  1  or,  equivalently,  c  >  3/2.  Thus,  J2k=o  3 k  =  0(3"), 
as  we  wished  to  show. 

We  have  to  be  careful  when  we  use  asymptotic  notation  to  prove  bounds  by  in¬ 
duction.  Consider  the  following  fallacious  proof  that  Ylk=i  k  =  0(n).  Certainly, 
^2k=]  k  =  0(1).  Assuming  that  the  bound  holds  for  n,  we  now  prove  it  for  n  +  1: 

M+ 1  n 

Yk  =  Yk  +  (n  +  l) 

k=  1  k=  1 

=  0(n)  +  (n  +  1)  <t=  wrong!! 

=  0(n  +  1)  . 


The  bug  in  the  argument  is  that  the  “constant”  hidden  by  the  “big-oh”  grows  with  n 
and  thus  is  not  constant.  We  have  not  shown  that  the  same  constant  works  for  all  n. 


Bounding  the  terms 

We  can  sometimes  obtain  a  good  upper  bound  on  a  series  by  bounding  each  term 
of  the  series,  and  it  often  suffices  to  use  the  largest  term  to  bound  the  others.  For 
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example,  a  quick  upper  bound  on  the  arithmetic  series  (A.l)  is 

n  n 

Yk  <  Yn 

k=  1  k=  1 

=  n2  . 

In  general,  for  a  series  Yk=\  ak>  if  we  ^et  flmax  =  maxi<jfc<„  d-k,  then 


n 

Y,  tik  <  n  ■  a 

k=  1 


The  technique  of  bounding  each  term  in  a  series  by  the  largest  term  is  a  weak 
method  when  the  series  can  in  fact  be  bounded  by  a  geometric  series.  Given  the 
series  Yk=oak’  suppose  that  ag+i/ag  <  r  for  all  k  >  0,  where  0  <  r  <  1  is  a 
constant.  We  can  bound  the  sum  by  an  infinite  decreasing  geometric  series,  since 
at  <  a0rk ,  and  thus 


Yak  -  Ya°rk 

k= 0  k= 0 

oo 

=  a0  Y  yk 


k= 0 

l 


We  can  apply  this  method  to  bound  the  summation  YlT=  i(k/3k)-  In  order  to 
stall  the  summation  at  k  =  0,  we  rewrite  it  as  +  1 ) /3k~l  1 ).  The  first 

term  ( a0 )  is  1/3,  and  the  ratio  (r  )  of  consecutive  terms  is 


(k  +  2)/3k+2  1  k  +  2 

{k  +  l)/3i+1  ”  3  ’  k  +  1 

2 

<  - 

“  3 


for  all  k  >  0.  Thus,  we  have 


OO 


E 


k 

¥ 


< 


oo 


E 


k  +  1 
3k+1 


1  1 

3  ’  1-2/3 

1  . 
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A  common  bug  in  applying  this  method  is  to  show  that  the  ratio  of  consecu¬ 
tive  terms  is  less  than  1  and  then  to  assume  that  the  summation  is  bounded  by  a 
geometric  series.  An  example  is  the  infinite  harmonic  series,  which  diverges  since 


k=  1  k= 1 


=  lim  @(lg  n) 

H— >-00 

=  oo  . 


The  ratio  of  the  (k  +  1  )st  and  k  th  terms  in  this  series  is  k /(k  +  1 )  <  1 ,  but  the  series 
is  not  bounded  by  a  decreasing  geometric  series.  To  bound  a  series  by  a  geometric 
series,  we  must  show  that  there  is  an  r  <  1,  which  is  a  constant,  such  that  the  ratio 
of  all  pairs  of  consecutive  terms  never  exceeds  r.  In  the  harmonic  series,  no  such  r 
exists  because  the  ratio  becomes  arbitrarily  close  to  1 . 


Splitting  summations 


One  way  to  obtain  bounds  on  a  difficult  summation  is  to  express  the  series  as  the 
sum  of  two  or  more  series  by  partitioning  the  range  of  the  index  and  then  to  bound 
each  of  the  resulting  series.  For  example,  suppose  we  try  to  find  a  lower  bound 
on  the  arithmetic  series  Y11=i  k,  which  we  have  already  seen  has  an  upper  bound 
of  72 2 .  We  might  attempt  to  bound  each  term  in  the  summation  by  the  smallest  term, 
but  since  that  term  is  1,  we  get  a  lower  bound  of  n  for  the  summation— far  off  from 
our  upper  bound  of  72 2 . 

We  can  obtain  a  better  lower  bound  by  first  splitting  the  summation.  Assume  for 
convenience  that  n  is  even.  We  have 


n  n/2  n 


k=  1  k=  1  k=n/ 2+1 


n/ 2  n 

>  °+  J2  ( n /2) 

k=  1  k=n/ 2+1 

=  (»/2)2 
=  ^(«2), 


which  is  an  asymptotically  tight  bound,  since  k  —  0(n2). 

For  a  summation  arising  from  the  analysis  of  an  algorithm,  we  can  often  split 
the  summation  and  ignore  a  constant  number  of  the  initial  terms.  Generally,  this 
technique  applies  when  each  term  £7*  in  a  summation  Ylk=o  °k 's  independent  of  n. 
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Then  for  any  constant  k0  >  0,  we  can  write 


k= 0 


ko~  1  n 

+  ^2ak 

k= 0  k=ko 

n 

0(1)  +  ak  , 
k=k0 


since  the  initial  terms  of  the  summation  are  all  constant  and  there  are  a  constant 
number  of  them.  We  can  then  use  other  methods  to  bound  Ylk=k0  ak-  This  tech¬ 
nique  applies  to  infinite  summations  as  well.  For  example,  to  find  an  asymptotic 
upper  bound  on 


^)k  ’ 


k= 0 


we  observe  that  the  ratio  of  consecutive  terms  is 


C k  +  l)2/2k+1  __  (k  +  l)2 
k2/ 2k  ”  2k2 

8 

<  - 

~  9 


if  k  >  3.  Thus,  the  summation  can  be  split  into 


k= 0 


=  0(1), 


since  the  first  summation  has  a  constant  number  of  terms  and  the  second  summation 
is  a  decreasing  geometric  series. 

The  technique  of  splitting  summations  can  help  us  determine  asymptotic  bounds 
in  much  more  difficult  situations.  For  example,  we  can  obtain  a  bound  of  0(\gn) 
on  the  harmonic  series  (A.7): 


Hn 


k=  1 


We  do  so  by  splitting  the  range  1  to  n  into  |_lg  « J  +  1  pieces  and  upper-bounding 
the  contribution  of  each  piece  by  1.  For  i  =  0,  1, ,  |_lg  « J ,  the  z'th  piece  consists 
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of  the  terms  starting  at  1/2'  and  going  up  to  but  not  including  1  /2' + 1 .  The  last 
piece  might  contain  terms  not  in  the  original  harmonic  series,  and  thus  we  have 


k=  1 


Llg/zJ  2Z  — 1 


£r  *  EE^rr 


^  ^  2l  +  / 
i=o  7=0  J 


< 


< 


LlgnJ  21— 1  . 

££? 

1=0  7=0 

Ug«J 

£> 

1=0 

lg  n  +  1  . 


(A.  10) 


Approximation  by  integrals 

When  a  summation  has  the  form  Yl'k=m  /(^)>  where  /(A:)  is  a  monotonically  in¬ 
creasing  function,  we  can  approximate  it  by  integrals: 


f  mdx<j2m<  [n+1  Ax) 

Jm-l  k=m  Jm 


dx  . 


(A.  11) 


Figure  A.  1  justifies  this  approximation.  The  summation  is  represented  as  the  area 
of  the  rectangles  in  the  figure,  and  the  integral  is  the  shaded  region  under  the  curve. 
When  / ( k )  is  a  monotonically  decreasing  function,  we  can  use  a  similar  method 
to  provide  the  bounds 

[  f(x)  dx  <  ^  f(k)  <  f  f(x)dx.  (A.  12) 

Jm  ,  J m—  1 

k=m 

The  integral  approximation  (A.  12)  gives  a  tight  estimate  for  the  /7  th  harmonic 
number.  For  a  lower  bound,  we  obtain 


A  i  r+1  dx 

5  /  - 


=  ln(n  +  1)  . 

For  the  upper  bound,  we  derive  the  inequality 


(A.  13) 


Ai  r  dx 

Sf £  /  - 


=  in  «  , 


X 
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Figure  A.l  Approximation  of  J2k=m  /(^)  by  integrals.  The  area  of  each  rectangle  is  shown 
within  the  rectangle,  and  the  total  rectangle  area  represents  the  value  of  the  summation.  The  in 
tegral  is  represented  by  the  shaded  area  under  the  curve.  By  comparing  areas  in  (a),  we  get 
fm-i  f (x)dx  <  Y-,1=m  /(^)>  an^  then  by  shifting  the  rectangles  one  unit  to  the  right,  we  get 

£i=m  /(*)  5  /m+I  f(x)dx  in  (b)- 
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which  yields  the  bound 


1 

—  <  In  n  +  1  . 
k 


(A.14) 


Exercises 

A.2-1 

Show  that  Ylk= i  1  /  is  bounded  above  by  a  constant. 
A.2-2 

Find  an  asymptotic  upper  bound  on  the  summation 


Lig«J 


E  r»/2‘i  • 


A.2-3 


Show  that  the  nth  haimonic  number  is  F2 (Ig  n)  by  splitting  the  summation. 

A.2-4 

Approximate  Ylk= i  ^  with  an  integral. 

A.2-5 

Why  didn’t  we  use  the  integral  approximation  (A.  12)  directly  on  Ylk=i  1/&  to 
obtain  an  upper  bound  on  the  nth  harmonic  number? 


Problems 


A-l  Bounding  summations 

Give  asymptotically  tight  bounds  on  the  following  summations.  Assume  that  r  >  0 
and  .v  >  0  are  constants. 


n 


k=  1 


n 


b. 


k=  1 


Notes  for  Appendix  A 
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c.  J>rlg^. 

k=  1 


Appendix  notes 

Knuth  [209]  provides  an  excellent  reference  for  the  material  presented  here.  You 
can  find  basic  properties  of  series  in  any  good  calculus  book,  such  as  Apostol  [18] 
or  Thomas  et  al.  [334]. 


B 


Sets,  Etc 


Many  chapters  of  this  book  touch  on  the  elements  of  discrete  mathematics.  This 
appendix  reviews  more  completely  the  notations,  definitions,  and  elementary  prop¬ 
erties  of  sets,  relations,  functions,  graphs,  and  trees.  If  you  are  already  well  versed 
in  this  material,  you  can  probably  just  skim  this  chapter. 


B.l  Sets 


A  set  is  a  collection  of  distinguishable  objects,  called  its  members  or  elements.  If 
an  object  x  is  a  member  of  a  set  S,  we  write  x  €  S  (read  “x  is  a  member  of  S” 
or,  more  briefly,  “x  is  in  .S'”).  If  x  is  not  a  member  of  S,  we  write  x  $  .S’.  We 
can  describe  a  set  by  explicitly  listing  its  members  as  a  list  inside  braces.  For 
example,  we  can  define  a  set  S  to  contain  precisely  the  numbers  1,  2,  and  3  by 
writing  S  =  {1,2,3}.  Since  2  is  a  member  of  the  set  S,  we  can  write  2  €  S,  and 
since  4  is  not  a  member,  we  have  4  ^  S.  A  set  cannot  contain  the  same  object  more 
than  once,1  and  its  elements  are  not  ordered.  Two  sets  A  and  B  are  equal,  written 
A  =  B,  if  they  contain  the  same  elements.  For  example,  {1, 2,  3,  1}  =  {1,2,3}  = 
{3,2,1}. 

We  adopt  special  notations  for  frequently  encountered  sets: 

•  0  denotes  the  empty  set,  that  is,  the  set  containing  no  members. 

•  Z  denotes  the  set  of  integers,  that  is,  the  set  {. . . ,  —2,  — 1, 0,  1, 2, . . .}. 

•  R  denotes  the  set  of  real  numbers. 

•  N  denotes  the  set  of  natural  numbers,  that  is,  the  set  {0,  1, 2, . .  ,}.2 


1 A  variation  of  a  set,  which  can  contain  the  same  object  more  than  once,  is  called  a  multiset. 

2Some  authors  start  the  natural  numbers  with  1  instead  of  0.  The  modern  trend  seems  to  be  to  start 
with  0. 
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If  all  the  elements  of  a  set  A  are  contained  in  a  set  B,  that  is,  if  x  e  A  implies 
x  €  B,  then  we  write  A  C  B  and  say  that  A  is  a  subset  of  B.  A  set  A  is  a 
proper  subset  of  B,  written  A  C  B,  if  A  C  B  but  A  ^  B.  (Some  authors  use  the 
symbol  “C”  to  denote  the  ordinary  subset  relation,  rather  than  the  proper-subset 
relation.)  For  any  set  A,  we  have  A  Q  A.  For  two  sets  A  and  B,  we  have  A  =  B 
if  and  only  if  A  c  B  and  B  C  A.  For  any  three  sets  A,  B,  and  C,  if  A  C  B 
and  B  C  C,  then  A  C  C .  For  any  set  A,  we  have  0  Q  A. 

We  sometimes  define  sets  in  terms  of  other  sets.  Given  a  set  A,  we  can  define  a 
set  B  C  A  by  stating  a  property  that  distinguishes  the  elements  of  B.  For  example, 
we  can  define  the  set  of  even  integers  by  {x  :  x  e  Z  and  x/2  is  an  integer}.  The 
colon  in  this  notation  is  read  “such  that.”  (Some  authors  use  a  vertical  bar  in  place 
of  the  colon.) 

Given  two  sets  A  and  B,  we  can  also  define  new  sets  by  applying  set  operations'. 

•  The  intersection  of  sets  A  and  B  is  the  set 

An  B  =  {x  \  x  £  A  and  x  e  B }  . 

•  The  union  of  sets  A  and  B  is  the  set 
AAB  =  {x\x^Aorx^B}  . 

•  The  difference  between  two  sets  A  and  B  is  the  set 
A  —  B  =  {x  :  x  €  A  and  x  B }  . 

Set  operations  obey  the  following  laws: 

Empty  set  laws: 

A  n  0  =  0  , 

A U  0  =  A. 

Idempotency  laws: 

AHA  =  A  , 

ADA  =  A. 

Commutative  laws: 

An  B  =  B n  A  , 

A  U  B  =  B  U  A  . 
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Figure  B.l  A  Venn  diagram  illustrating  the  first  of  DeMorgan’s  laws  (B.2).  Each  of  the  sets  A,  B, 
and  C  is  represented  as  a  circle. 


Associative  laws: 

An(fi  nc) 

AU(fi  UC) 

Distributive  laws: 

A  D  (B  U  C) 
AD(BnC) 

Absorption  laws: 

An  (A  US)  : 
AU(Anfi)  : 

DeMorgan’s  laws: 

A-(BnC) 

A-(BUC) 


(Anfi)nc, 

(AUfi)UC. 


(Anfi)u(Anc), 

(AUB)n(AUC). 


A. 
A  . 


(A-B)U(A-C), 

(A-B)n(A-C). 


(B.l) 


(B.2) 


Figure  B.  1  illustrates  the  first  of  DeMorgan’s  laws,  using  a  Venn  diagram',  a  graph¬ 
ical  picture  in  which  sets  are  represented  as  regions  of  the  plane. 

Often,  all  the  sets  under  consideration  are  subsets  of  some  larger  set  U  called  the 
universe.  For  example,  if  we  are  considering  various  sets  made  up  only  of  integers, 
the  set  Z  of  integers  is  an  appropriate  universe.  Given  a  universe  U,  we  define  the 
complement  ofasetAasA  =  C/  —  A  =  {x  :  x  e  U  and  x  A}.  For  any  set 
AC[/,  we  have  the  following  laws: 

1  = 

A  n  A  = 

A  U  A  = 


A, 
0, 
U  . 
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We  can  rewrite  DeMorgan’s  laws  (B.2)  with  set  complements.  For  any  two  sets 
B.  C  c  U,  we  have 

B n  c  -  sue, 

FlTc  =  fine. 

Two  sets  A  and  B  are  disjoint  if  they  have  no  elements  in  common,  that  is,  if 
A  n  B  =  0.  A  collection  S  =  {Sj}  of  nonempty  sets  forms  a  partition  of  a  set  S  if 

•  the  sets  are  pairwise  disjoint ,  that  is,  .S', ,  Sj  e  ^  and  i  j  imply  ,S',  D  Sj-  =  0, 
and 

•  their  union  is  S,  that  is, 

s  =  U  ■ 

Si<=S 

In  other  words,  S  forms  a  partition  of  S  if  each  element  of  S  appears  in  exactly 
one  Si  €  S. 

The  number  of  elements  in  a  set  is  the  cardinality  (or  size)  of  the  set,  denoted  |  S  \ . 
Two  sets  have  the  same  cardinality  if  their  elements  can  be  put  into  a  one-to-one 
correspondence.  The  cardinality  of  the  empty  set  is  1 0 1  =  0.  If  the  cardinality  of  a 
set  is  a  natural  number,  we  say  the  set  is  finite ;  otherwise,  it  is  infinite.  An  infinite 
set  that  can  be  put  into  a  one-to-one  correspondence  with  the  natural  numbers  N  is 
countably  infinite ;  otherwise,  it  is  uncountable.  For  example,  the  integers  Z  are 
countable,  but  the  reals  M  are  uncountable. 

For  any  two  finite  sets  A  and  B,  we  have  the  identity 

\A  U  B\  =  \A\  +  |fi|  - \An  B\  ,  (B.3) 

from  which  we  can  conclude  that 

\AUB\<  \A\  +  |B|  . 

If  A  and  B  are  disjoint,  then  \A  D  B\  =  0  and  thus  \A\J  B\  =  \A\  +  |S|.  If 
A  c  B,  then  \A\  <  | j5|. 

A  finite  set  of  n  elements  is  sometimes  called  an  n-set.  A  1-set  is  called  a 
singleton.  A  subset  of  k  elements  of  a  set  is  sometimes  called  a  k-subset. 

We  denote  the  set  of  all  subsets  of  a  set  S,  including  the  empty  set  and  S  itself, 
by  2s ;  we  call  2s  the  power  set  of  S.  For  example,  =  {0,  {a}  ,  {b} ,  { a ,  b}}. 
The  power  set  of  a  finite  set  S  has  cardinality  2  ‘v  (see  Exercise  B.l-5). 

We  sometimes  care  about  setlike  structures  in  which  the  elements  are  ordered. 
An  ordered  pair  of  two  elements  a  and  b  is  denoted  ( a,b )  and  is  defined  formally 
as  the  set  (a.  b)  =  { a ,  {a.b}}.  Thus,  the  ordered  pair  (a,  b)  is  not  the  same  as  the 
ordered  pair  (h.  a  ). 
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The  Cartesian  product  of  two  sets  A  and  B,  denoted  A  x  B,  is  the  set  of  all 
ordered  pairs  such  that  the  first  element  of  the  pair  is  an  element  of  A  and  the 
second  is  an  element  of  B .  More  formally, 

A  x  B  =  {( a,b )  :  a  e  A  and  b  e  B}  . 

For  example,  {a,  b}x{a,  b,  c}  =  {(a, a),  ( a,b ),  (a,c),  ( b,a ),  ( b,b ),  {b,c)}.  When 
A  and  B  are  finite  sets,  the  cardinality  of  their  Cartesian  product  is 

\A  x  B\  =  \A\ ■  \B\  .  (B.4) 

The  Cartesian  product  of  n  sets  A  t ,  A2, ....  A„  is  the  set  of  n-tuples 

Axx  A2x  ■  ■  ■  x  An  —  {(ci\,a2, . . . ,  an)  :  a,  6  At  for  i  =  1,2 , 

whose  cardinality  is 

\AixA2x---xA„\  =  \A1\  -  |  /1 2 1  '  '  '  \  A  n\ 

if  all  sets  ai-e  finite.  We  denote  an  n-fold  Cartesian  product  over  a  single  set  A  by 
the  set 

An  =  AxAx---xA, 

whose  cardinality  is  \An\  =  \A\n  if  A  is  finite.  We  can  also  view  an  /(-tuple  as  a 
finite  sequence  of  length  n  (see  page  1166). 

Exercises 


B.l-1 

Draw  Venn  diagrams  that  illustrate  the  first  of  the  distributive  laws  (B.l). 


B.l-2 

Prove  the  generalization  of  DeMorgan’s  laws  to  any  finite  collection  of  sets: 

Ai  n  a2  n ■■■  n  An  = 

4iU42u-u/i„  =  17nlIn---nX- 
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B.l-3  * 

Prove  the  generalization  of  equation  (B.3),  which  is  called  the  principle  of  inclu¬ 
sion  and  exclusion : 

Mi  u  A2  u  •  •  •  U  An\  = 

Mil  +  \a2\  +  •  •  •  +  Mnl 

—  Mi  fl  A2\  —  Mi  n  A3\ -  (all  pairs) 

+  \Ai  n  A2  n  A3\  +  ■  ■  ■  (all  triples) 

+  (-1)"-1  Mi  n  n  •••  n  An\  . 


B.l-4 

Show  that  the  set  of  odd  natural  numbers  is  countable. 


B.l-5 

Show  that  for  any  finite  set  S,  the  power  set  2s  has  2I‘V  elements  (that  is,  there 
are  21 51  distinct  subsets  of  S'). 


B.l-6 

Give  an  inductive  definition  for  an  77 -tuple  by  extending  the  set-theoretic  definition 
for  an  ordered  pair. 


B.2  Relations 

A  binary  relation  R  on  two  sets  A  and  B  is  a  subset  of  the  Cartesian  product  Ax  B. 
If  (a,b)  e  R,  we  sometimes  write  a  R  b.  When  we  say  that  R  is  a  binary  relation 
on  a  set  A,  we  mean  that  R  is  a  subset  of  Ax  A.  For  example,  the  “less  than” 
relation  on  the  natural  numbers  is  the  set  \(a.  b)  :  a,b  €  N  and  a  <  b |.  An  77 -ary 
relation  on  sets  A1 ,  A2, . . .  ,An  is  a  subset  of  Ax  x  A2  x  ■  ■  ■  x  An. 

A  binary  relation  R  c  A  x  A  is  reflexive  if 

a  R  a 

for  all  a  e  A.  For  example,  “=”  and  “<”  are  reflexive  relations  on  N,  but  “<”  is 
not.  The  relation  R  is  symmetric  if 

a  R  b  implies  b  R  a 

for  all  a,b  e  A.  For  example,  “=”  is  symmetric,  but  “<”  and  “<”  are  not.  The 
relation  R  is  transitive  if 

a  R  b  and  b  R  c  imply  a  R  c 
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for  all  a,  b,  c  e  A.  For  example,  the  relations  and  “=”  are  transitive,  but 

the  relation  R  =  {(a,  b)  :  a,b  €  N  and  a  =  b  —  1}  is  not,  since  3  R  4  and  4  R  5 
do  not  imply  3  R  5. 

A  relation  that  is  reflexive,  symmetric,  and  transitive  is  an  equivalence  relation. 
For  example,  “=”  is  an  equivalence  relation  on  the  natural  numbers,  but  “<”  is  not. 
If  R  is  an  equivalence  relation  on  a  set  A,  then  for  a  €  ,4,  the  equivalence  class 
of  a  is  the  set  [a\  =  {b  S  A  :  a  R  b},  that  is,  the  set  of  all  elements  equivalent  to  a. 
For  example,  if  we  define  R  =  {(a,  b)  :  a,  b  €  N  and  a  +  b  is  an  even  number}, 
then  R  is  an  equivalence  relation,  since  a  +  a  is  even  (reflexive),  a  +  b  is  even 
implies  b  +  a  is  even  (symmetric),  and  a  +  b  is  even  and  h  +  c  is  even  imply 
a  +  c  is  even  (transitive).  The  equivalence  class  of  4  is  [4]  =  {0, 2,  4,6,.. .},  and 
the  equivalence  class  of  3  is  [3]  =  {1,  3,  5,  7, . . .}.  A  basic  theorem  of  equivalence 
classes  is  the  following. 

Theorem  B.l  (An  equivalence  relation  is  the  same  as  a  partition ) 

The  equivalence  classes  of  any  equivalence  relation  R  on  a  set  A  form  a  partition 
of  A,  and  any  partition  of  A  determines  an  equivalence  relation  on  A  for  which  the 
sets  in  the  partition  are  the  equivalence  classes. 

Proof  For  the  first  part  of  the  proof,  we  must  show  that  the  equivalence  classes 
of  R  are  nonempty,  pairwise-disjoint  sets  whose  union  is  A.  Because  R  is  reflex¬ 
ive,  a  €  [a],  and  so  the  equivalence  classes  are  nonempty;  moreover,  since  every 
element  a  €  A  belongs  to  the  equivalence  class  [a],  the  union  of  the  equivalence 
classes  is  A.  It  remains  to  show  that  the  equivalence  classes  are  pairwise  disjoint, 
that  is,  if  two  equivalence  classes  [a]  and  [ b ]  have  an  element  c  in  common,  then 
they  are  in  fact  the  same  set.  Suppose  that  a  R  c  and  b  R  c.  By  symmetry,  c  R  b, 
and  by  transitivity,  a  R  b.  Thus,  for  any  arbitrary  element  x  e  [a],  we  have  x  R  a 
and,  by  transitivity,  x  R  b,  and  thus  [a]  C  [/>].  Similarly,  [ b ]  C  [a],  and  thus 

[a]  =  U 

For  the  second  part  of  the  proof,  let  A  =  {A,}  be  a  partition  of  A,  and  define 
R  =  {(a,  b)  :  there  exists  i  such  that  a  e  A,  and  b  e  A,}.  We  claim  that  R  is  an 
equivalence  relation  on  A.  Reflexivity  holds,  since  a  e  A,  implies  a  R  a.  Symme¬ 
try  holds,  because  if  a  R  b,  then  a  and  b  are  in  the  same  set  A,-,  and  hence  b  R  a. 
If  a  R  b  and  b  R  c,  then  all  three  elements  are  in  the  same  set  A,  ,  and  thus  a  R  c 
and  transitivity  holds.  To  see  that  the  sets  in  the  partition  are  the  equivalence 
classes  of  R,  observe  that  if  a  €  A,-,  then  x  e  [a]  implies  x  e  A,,  and  x  e  A, 
implies  x  e  [a].  m 

A  binary  relation  R  on  a  set  A  is  antisymmetric  if 


a  R  b  and  b  R  a  imply  a  =  b  . 
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For  example,  the  “<”  relation  on  the  natural  numbers  is  antisymmetric,  since  a  <  b 
and  b  <  a  imply  a  =  b.  A  relation  that  is  reflexive,  antisymmetric,  and  transitive 
is  a  partial  order,  and  we  call  a  set  on  which  a  partial  order  is  defined  a  partially 
ordered  set.  For  example,  the  relation  “is  a  descendant  of”  is  a  partial  order  on  the 
set  of  all  people  (if  we  view  individuals  as  being  their  own  descendants). 

In  a  partially  ordered  set  A,  there  may  be  no  single  “maximum”  element  a  such 
that  b  R  a  for  all  b  e  A.  Instead,  the  set  may  contain  several  maximal  elements  a 
such  that  for  no  b  €  A,  where  b  ^  a,  is  it  the  case  that  a  R  b.  For  example,  a 
collection  of  different-sized  boxes  may  contain  several  maximal  boxes  that  don’t 
fit  inside  any  other  box,  yet  it  has  no  single  “maximum”  box  into  which  any  other 
box  will  fit.3 

A  relation  R  on  a  set  A  is  a  total  relation  if  for  all  a,  b  €  A,  we  have  a  R  b 
or  b  R  a  (or  both),  that  is,  if  every  pairing  of  elements  of  A  is  related  by  R.  A 
partial  order  that  is  also  a  total  relation  is  a  total  order  or  linear  order.  For  example, 
the  relation  “<”  is  a  total  order  on  the  natural  numbers,  but  the  “is  a  descendant 
of”  relation  is  not  a  total  order  on  the  set  of  all  people,  since  there  are  individuals 
neither  of  whom  is  descended  from  the  other.  A  total  relation  that  is  transitive,  but 
not  necessarily  reflexive  and  antisymmetric,  is  a  total  preorder. 

Exercises 


B.2-1 

Prove  that  the  subset  relation  “C”  on  all  subsets  of  Z  is  a  partial  order  but  not  a 
total  order. 

B.2-2 

Show  that  for  any  positive  integer  n,  the  relation  “equivalent  modulo  n”  is  an  equiv¬ 
alence  relation  on  the  integers.  (We  say  that  a  =  b  (mod  n )  if  there  exists  an 
integer  q  such  that  a  —  b  =  qn.)  Into  what  equivalence  classes  does  this  relation 
partition  the  integers? 


B.2-3 

Give  examples  of  relations  that  are 

a.  reflexive  and  symmetric  but  not  transitive, 

b.  reflexive  and  transitive  but  not  symmetric, 

c.  symmetric  and  transitive  but  not  reflexive. 


3To  be  precise,  in  order  for  the  “fit  inside”  relation  to  be  a  partial  order,  we  need  to  view  a  box  as 
fitting  inside  itself. 
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B.2-4 

Let  S  be  a  finite  set,  and  let  R  be  an  equivalence  relation  on  S  x  S.  Show  that  if 
in  addition  R  is  antisymmetric,  then  the  equivalence  classes  of  S  with  respect  to  R 
are  singletons. 


B.2-5 

Professor  Narcissus  claims  that  if  a  relation  R  is  symmetric  and  transitive,  then  it  is 
also  reflexive.  He  offers  the  following  proof.  By  symmetry,  a  R  b  implies  b  R  a. 
Transitivity,  therefore,  implies  a  R  a.  Is  the  professor  correct? 


B.3  Functions 

Given  two  sets  A  and  B,  a  function  f  is  a  binary  relation  on  A  and  B  such  that 
for  all  a  €  A,  there  exists  precisely  one  b  €  B  such  that  (a.b)  e  /.  The  set  A  is 
called  the  domain  of  /,  and  the  set  B  is  called  the  codomain  of  /.  We  sometimes 
write  /  :  A  — >■  B\  and  if  (a.b)  €  /,  we  write  b  =  f(a),  since  b  is  uniquely 
determined  by  the  choice  of  a. 

Intuitively,  the  function  /  assigns  an  element  of  B  to  each  element  of  A.  No 
element  of  A  is  assigned  two  different  elements  of  B,  but  the  same  element  of  B 
can  be  assigned  to  two  different  elements  of  A.  For  example,  the  binary  relation 

/  ={(a,i):fl,l)eN  and  b  =  a  mod  2} 

is  a  function  /  :  N  — »■  {0,  1},  since  for  each  natural  number  a,  there  is  exactly  one 
value  b  in  {0,  1}  such  that  b  =  a  mod  2.  For  this  example,  0  =  /(0),  1  =  /( 1), 
0  =  / (2),  etc.  In  contrast,  the  binary  relation 

g  =  {(a,  b)  :  a,  b  €  N  and  a  +  b  is  even} 

is  not  a  function,  since  (1,  3)  and  (1,5)  are  both  in  g,  and  thus  for  the  choice  a  =  1, 
there  is  not  precisely  one  b  such  that  (a.b)  e  g. 

Given  a  function  /  :  A  — ►  B,  if  b  =  / (a),  we  say  that  a  is  the  argument  of  / 
and  that  b  is  the  value  of  /  at  a.  We  can  define  a  function  by  stating  its  value  for 
every  element  of  its  domain.  For  example,  we  might  define  / (n)  =  In  for  n  S  N, 
which  means  /  =  {(n.2n)  :  n  e  N}.  Two  functions  /  and  g  are  equal  if  they 
have  the  same  domain  and  codomain  and  if,  for  all  a  in  the  domain,  / (a)  =  g(a). 

A  finite  sequence  of  length  n  is  a  function  /  whose  domain  is  the  set  of  n 
integers  {0,1 —  1}.  We  often  denote  a  finite  sequence  by  listing  its  values: 
</(0),  /( 1),  ...,/(«  —  1)).  An  infinite  sequence  is  a  function  whose  domain  is 
the  set  N  of  natural  numbers.  For  example,  the  Fibonacci  sequence,  defined  by 
recurrence  (3.22),  is  the  infinite  sequence  (0, 1,  1, 2,  3,  5,  8,  13, 21, . . .). 
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When  the  domain  of  a  function  /  is  a  Cartesian  product,  we  often  omit  the  extra 
parentheses  surrounding  the  argument  of  /.  For  example,  if  we  had  a  function 
/  :  Ai  x  A2  x  ■  ■  ■  x  A„  — >  B,  we  would  write  b  =  f{a\,  a2,  .  .  . ,  an)  instead 
of  b  =  /((«!,  a2,  •  •  •  We  also  call  each  a,  an  argument  to  the  function  /, 

though  technically  the  (single)  argument  to  /  is  the  /2-tuple  (alf  a2,  .  .  . ,  an). 

If  /  :  A  — >■  B  is  a  function  and  &  =  f(a),  then  we  sometimes  say  that  b  is  the 
image  of  a  under  /.  The  image  of  a  set  A'  C  A  under  /  is  defined  by 

f(A')  —  {b  e  B  :  b  =  f(a)  for  some  a  e  A’}  . 

The  range  of  /  is  the  image  of  its  domain,  that  is,  f(A).  For  example,  the  range 
of  the  function  /  :  N  — >  N  defined  by  f(n )  =  2 n  is  /(N)  =  {m  :  m  =  2 n  for 
some  n  e  N},  in  other  words,  the  set  of  nonnegative  even  integers. 

A  function  is  a  surjection  if  its  range  is  its  codomain.  For  example,  the  function 
f(n)  =  [n/2 J  is  a  surjective  function  from  N  to  N,  since  every  element  in  N 
appears  as  the  value  of  /  for  some  argument.  In  contrast,  the  function  f(n )  =  2n 
is  not  a  surjective  function  from  N  to  N,  since  no  argument  to  /  can  produce  3  as  a 
value.  The  function  f(n)  =  2 n  is,  however,  a  surjective  function  from  the  natural 
numbers  to  the  even  numbers.  A  surjection  /  :  A  — »■  B  is  sometimes  described  as 
mapping  A  onto  B.  When  we  say  that  /  is  onto,  we  mean  that  it  is  surjective. 

A  function  /  :  A  B  is  an  injection  if  distinct  arguments  to  /  produce 
distinct  values,  that  is,  if  a  ^  a’  implies  f(a)  ^  f  (a’).  For  example,  the  function 
f(n)  =  2 n  is  an  injective  function  from  N  to  N,  since  each  even  number  b  is  the 
image  under  /  of  at  most  one  element  of  the  domain,  namely  b/2.  The  function 
f(n)  =  [ti / 2J  is  not  injective,  since  the  value  1  is  produced  by  two  arguments:  2 
and  3.  An  injection  is  sometimes  called  a  one-to-one  function. 

A  function  /  :  A  -*■  B  is  a  bijection  if  it  is  injective  and  surjective.  For  example, 
the  function  f(n)  =  (—1)"  |"«/2]  is  a  bijection  from  N  to  Z: 

0  -*  0, 

1^-1, 

2  — >  1  , 

3^-2, 

4  2, 


The  function  is  injective,  since  no  element  of  Z  is  the  image  of  more  than  one 
element  of  N.  It  is  surjective,  since  eve  17  element  of  Z  appeal's  as  the  image  of 
some  element  of  N.  Hence,  the  function  is  bijective.  A  bijection  is  sometimes 
called  a  one-to-one  correspondence ,  since  it  pairs  elements  in  the  domain  and 
codomain.  A  bijection  from  a  set  A  to  itself  is  sometimes  called  a  permutation. 
When  a  function  /  is  bijective,  we  define  its  inverse  /“'  as 

f~l(b)  =  a  if  and  only  if  / (a)  =  b  . 
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For  example,  the  inverse  of  the  function  f(n)  =  (—1)"  \n/ 2]  is 

2m  if  m  >  0  , 

—2m  —  1  if  m  <  0  . 

Exercises 

B.3-1 

Let  A  and  B  be  finite  sets,  and  let  /  :  A  — »•  B  be  a  function.  Show  that 

a.  if  f  is  injective,  then  \A\  <  1*1; 

b.  if  /  is  surjective,  then  \A\  >  \B\. 


f  1(m)  = 


B.3-2 

Is  the  function  /(x)  =  x  +  1  bijective  when  the  domain  and  the  codomain  are  N? 
Is  it  bijective  when  the  domain  and  the  codomain  are  Z? 


B.3-3 

Give  a  natural  definition  for  the  inverse  of  a  binary  relation  such  that  if  a  relation 
is  in  fact  a  bijective  function,  its  relational  inverse  is  its  functional  inverse. 

B.3-4  * 

Give  a  bijection  from  Z  to  Z  x  Z. 


B.4  Graphs 

This  section  presents  two  kinds  of  graphs:  directed  and  undirected.  Certain  def¬ 
initions  in  the  literature  differ  from  those  given  here,  but  for  the  most  part,  the 
differences  are  slight.  Section  22.1  shows  how  we  can  represent  graphs  in  com¬ 
puter  memory. 

A  directed  graph  (or  digraph)  G  is  a  pair  (V,  E),  where  V  is  a  finite  set  and  E 
is  a  binary  relation  on  V.  The  set  V  is  called  the  vertex  set  of  G,  and  its  elements 
are  called  vertices  (singular:  vertex).  The  set  E  is  called  the  edge  set  of  G,  and  its 
elements  are  called  edges.  Figure  B.2(a)  is  a  pictorial  representation  of  a  directed 
graph  on  the  vertex  set  {1,2,  3, 4,  5,  6}.  Vertices  are  represented  by  circles  in  the 
figure,  and  edges  are  represented  by  arrows.  Note  that  self-loops— e dges  from  a 
vertex  to  itself—  are  possible. 

In  an  undirected  graph  G  =  (V.E),  the  edge  set  E  consists  of  unordered 
pairs  of  vertices,  rather  than  ordered  pairs.  That  is,  an  edge  is  a  set  {n,  v},  where 


B.4  Graphs 


1169 


Figure  B.2  Directed  and  undirected  graphs,  (a)  A  directed  graph  G  =  (V,  E),  where  V  = 
{1, 2, 3, 4, 5, 6}  and  £  =  {(1.2),(2,2),(2,4),(2,5),(4, 1), (4, 5), (5, 4), (6, 3)}.  The  edge  (2.2) 
is  a  self  loop,  (b)  An  undirected  graph  G  =  (V,  E),  where  V  =  {1,2,  3, 4, 5, 6}  and  E  = 
{(1,2),  (1, 5),  (2,  5),  (3.6)}.  The  vertex  4  is  isolated,  (c)  The  subgraph  of  the  graph  in  part  (a) 
induced  by  the  vertex  set  {1, 2, 3, 6}. 

u,  v  €  V  and  w  /  v.  By  convention,  we  use  the  notation  (u,  v)  for  an  edge,  rather 
than  the  set  notation  {w,  v},  and  we  consider  (w,  v)  and  (v,  u )  to  be  the  same  edge. 
In  an  undirected  graph,  self-loops  are  forbidden,  and  so  every  edge  consists  of  two 
distinct  vertices.  Figure  B.2(b)  is  a  pictorial  representation  of  an  undirected  graph 
on  the  vertex  set  {1 , 2, 3, 4, 5,  6}. 

Many  definitions  for  directed  and  undirected  graphs  are  the  same,  although  cer¬ 
tain  terms  have  slightly  different  meanings  in  the  two  contexts.  If  ( u ,  v)  is  an  edge 
in  a  directed  graph  G  =  (V,E),  we  say  that  (w,v)  is  incident  front  or  leaves 
vertex  u  and  is  incident  to  or  enters  vertex  v.  For  example,  the  edges  leaving  ver¬ 
tex  2  in  Figure  B.2(a)  are  (2, 2),  (2, 4),  and  (2, 5).  The  edges  entering  vertex  2  are 
(1,2)  and  (2,2).  If  ( u ,  v)  is  an  edge  in  an  undirected  graph  G  =  ( V ,  E),  we  say 
that  ( u ,  v)  is  incident  on  vertices  u  and  v.  In  Figure  B.2(b),  the  edges  incident  on 
vertex  2  are  (1,  2)  and  (2, 5). 

If  (w,  v>)  is  an  edge  in  a  graph  G  =  (V,  E),  we  say  that  vertex  v  is  adjacent  to 
vertex  u.  When  the  graph  is  undirected,  the  adjacency  relation  is  symmetric.  When 
the  graph  is  directed,  the  adjacency  relation  is  not  necessarily  symmetric.  If  v  is 
adjacent  to  u  in  a  directed  graph,  we  sometimes  write  u  — >  v.  In  parts  (a)  and  (b) 
of  Figure  B.2,  vertex  2  is  adjacent  to  vertex  1,  since  the  edge  (1, 2)  belongs  to  both 
graphs.  Vertex  1  is  not  adjacent  to  vertex  2  in  Figure  B.2(a),  since  the  edge  (2, 1) 
does  not  belong  to  the  graph. 

The  degree  of  a  vertex  in  an  undirected  graph  is  the  number  of  edges  incident  on 
it.  For  example,  vertex  2  in  Figure  B.2(b)  has  degree  2.  A  vertex  whose  degree  is  0, 
such  as  vertex  4  in  Figure  B.2(b),  is  isolated.  In  a  directed  graph,  the  out-degree 
of  a  vertex  is  the  number  of  edges  leaving  it,  and  the  in -degree  of  a  vertex  is  the 
number  of  edges  entering  it.  The  degree  of  a  vertex  in  a  directed  graph  is  its  in- 
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degree  plus  its  out-degree.  Vertex  2  in  Figure  B.2(a)  has  in-degree  2,  out-degree  3, 
and  degree  5. 

A  path  of  length  k  from  a  vertex  it  to  a  vertex  u'  in  a  graph  G  =  (V.  E  ) 
is  a  sequence  (v0,  Vi,  v2,  . . . ,  14)  of  vertices  such  that  u  =  v0,  u'  =  u/o  and 
(v,_i,V;)  €  E  for  i  =  1,2 The  length  of  the  path  is  the  number  of 
edges  in  the  path.  The  path  contains  the  vertices  v0, 14, . . . ,  v*  and  the  edges 
(v0,  Vi),  (vi,  v2), . . . ,  (vk~\ ,  Vk).  (There  is  always  a  0-length  path  from  u  to  u.)  If 
there  is  a  path  p  from  u  to  u' ,  we  say  that  u'  is  reachable  from  u  via  p,  which  we 
sometimes  write  as  u  u'  if  G  is  directed.  A  path  is  simple 4  if  all  vertices  in  the 
path  are  distinct.  In  Figure  B.2(a),  the  path  (1, 2,  5,  4)  is  a  simple  path  of  length  3. 
The  path  (2,  5,  4,  5)  is  not  simple. 

A  subpath  of  path  p  =  (y0,  Vi . Vk)  is  a  contiguous  subsequence  of  its  ver¬ 

tices.  That  is,  for  any  0  <  i  <  j  <  k,  the  subsequence  of  vertices  ( v, ,  vi+1 , . . . ,  v, ) 
is  a  subpath  of  p. 

In  a  directed  graph,  a  path  (v0,  Vi,  . . . ,  14)  forms  a  cycle  if  v0  =  14  and  the 
path  contains  at  least  one  edge.  The  cycle  is  simple  if,  in  addition,  vx,  v2, . . . ,  Vp 
are  distinct.  A  self-loop  is  a  cycle  of  length  1.  Two  paths  (v0,  iq,  v2, . . . ,  Vk-i,  i’o) 
and  (uq,  vx,  v'2,  . . . ,  v'k_x ,  v'Q)  foim  the  same  cycle  if  there  exists  an  integer  j  such 
that  v-  =  V(i+J)  mod  k  for  i  =  0, 1, . . . ,  k  —  1.  In  Figure  B.2(a),  the  path  (1,2,4,  1) 
forms  the  same  cycle  as  the  paths  (2,  4,  1,  2)  and  (4,  1,  2,  4).  This  cycle  is  simple, 
but  the  cycle  (1,  2,  4,  5,  4,  1)  is  not.  The  cycle  (2,  2)  formed  by  the  edge  (2, 2)  is 
a  self-loop.  A  directed  graph  with  no  self-loops  is  simple.  In  an  undirected  graph, 
a  path  (i»o,  v  1, ... ,  14)  forms  a  cycle  if  k  >  3  and  v0  =  14;  the  cycle  is  simple  if 
vx,  v2, . . . ,  Vk  are  distinct.  For  example,  in  Figure  B.2(b),  the  path  (1,  2,  5,  1)  is  a 
simple  cycle.  A  graph  with  no  cycles  is  acyclic. 

An  undirected  graph  is  connected  if  eve  17  vertex  is  reachable  from  all  other 
vertices.  The  connected  components  of  a  graph  are  the  equivalence  classes  of 
vertices  under  the  “is  reachable  from”  relation.  The  graph  in  Figure  B.2(b)  has 
three  connected  components:  {1,2,  5},  {3,  6},  and  {4}.  Every  vertex  in  {1,2,  5}  is 
reachable  from  every  other  vertex  in  {1,2,5}.  An  undirected  graph  is  connected 
if  it  has  exactly  one  connected  component.  The  edges  of  a  connected  component 
are  those  that  are  incident  on  only  the  vertices  of  the  component;  in  other  words, 
edge  ( u ,  y)  is  an  edge  of  a  connected  component  only  if  both  u  and  v  are  vertices 
of  the  component. 

A  directed  graph  is  strongly  connected  if  every  two  vertices  are  reachable  from 
each  other.  The  strongly  connected  components  of  a  directed  graph  are  the  equiv- 


4Some  authors  refer  to  what  we  call  a  path  as  a  “walk”  and  to  what  we  call  a  simple  path  as  just  a 
“path.”  We  use  the  terms  “path”  and  “simple  path”  throughout  this  book  in  a  manner  consistent  with 
their  definitions. 
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(a) 


(b) 


Figure  B.3  (a)  A  pair  of  isomorphic  graphs.  The  vertices  of  the  top  graph  are  mapped  to  the 

vertices  of  the  bottom  graph  by  /( 1)  =  u,  f(2)  =  v,  /( 3)  =  u>,  /(4)  =  x  ,f( 5)  =  y ,  /( 6)  =  z- 
(b)  Two  graphs  that  are  not  isomorphic,  since  the  top  graph  has  a  vertex  of  degree  4  and  the  bottom 
graph  does  not. 

alence  classes  of  vertices  under  the  “are  mutually  reachable”  relation.  A  directed 
graph  is  strongly  connected  if  it  has  only  one  strongly  connected  component.  The 
graph  in  Figure  B.2(a)  has  three  strongly  connected  components:  {1,2, 4, 5},  {3}, 
and  {6}.  All  pairs  of  vertices  in  {1,2, 4, 5}  are  mutually  reachable.  The  ver¬ 
tices  {3, 6}  do  not  form  a  strongly  connected  component,  since  vertex  6  cannot 
be  reached  from  vertex  3. 

Two  graphs  G  =  (V,  E)  and  G'  =  (V',E')  are  isomorphic  if  there  exists  a 
bijection  /  :  V  —*■  V'  such  that  (w,  v)  €  E  if  and  only  if  (/(m),  /(v))  e  E'. 
In  other  words,  we  can  relabel  the  vertices  of  G  to  be  vertices  of  G',  maintain¬ 
ing  the  corresponding  edges  in  G  and  G' .  Figure  B.3(a)  shows  a  pair  of  iso¬ 
morphic  graphs  G  and  G'  with  respective  vertex  sets  V  =  {1,2, 3, 4, 5, 6}  and 
V'  =  {u,  v,  w,x,y,  z}.  The  mapping  from  V  to  V'  given  by  /( 1)  =  m,  /( 2)  =  v, 
/( 3)  =  w,  /(4)  =  x,  f( 5)  =  y,  f( 6)  =  z  provides  the  required  bijective  func¬ 
tion.  The  graphs  in  Figure  B.3(b)  are  not  isomorphic.  Although  both  graphs  have 
5  vertices  and  7  edges,  the  top  graph  has  a  vertex  of  degree  4  and  the  bottom  graph 
does  not. 

We  say  that  a  graph  G'  =  (V' ,E')  is  a  subgraph  of  G  =  ( V ,  E)  if  V'  C  V 
and  E'  C  E.  Given  a  set  V  C  V,  the  subgraph  of  G  induced  by  V  is  the  graph 
G'  =  (V',  E'),  where 

E'  =  {(w.v)  e  E  :  u.v  e  V'}  . 
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The  subgraph  induced  by  the  vertex  set  {1,2,  3,  6}  in  Figure  B.2(a)  appeal's  in 
Figure  B.2(c)  and  has  the  edge  set  {(1,2),  (2, 2),  (6,  3)}. 

Given  an  undirected  graph  G  =  (V.  E),  the  directed  version  of  G  is  the  directed 
graph  G'  =  (V,  E'),  where  ( u ,  v )  e  E’  if  and  only  if  ( u ,  v)  e  E.  That  is,  we 
replace  each  undirected  edge  (u,  v)  in  G  by  the  two  directed  edges  (u,v)  and  (v,  u) 
in  the  directed  version.  Given  a  directed  graph  G  =  (V,  E),  the  undirected  version 
of  G  is  the  undirected  graph  G'  =  (V,  E'),  where  (u,v)  e  E'  if  and  only  if  n/u 
and  (u,v)  e  E.  That  is,  the  undirected  version  contains  the  edges  of  G  “with 
their  directions  removed”  and  with  self-loops  eliminated.  (Since  (u,  v)  and  (v,  u) 
are  the  same  edge  in  an  undirected  graph,  the  undirected  version  of  a  directed 
graph  contains  it  only  once,  even  if  the  directed  graph  contains  both  edges  ( u ,  v) 
and  (v,  u).)  In  a  directed  graph  G  =  (  V.  E ),  a  neighbor  of  a  vertex  u  is  any  vertex 
that  is  adjacent  to  u  in  the  undirected  version  of  G.  That  is,  v  is  a  neighbor  of  u  if 
u  v  and  either  (u,  v)  e  E  or  (v,  u)  e  E.  In  an  undirected  graph,  u  and  v  are 
neighbors  if  they  are  adjacent. 

Several  kinds  of  graphs  have  special  names.  A  complete  graph  is  an  undirected 
graph  in  which  every  pair  of  vertices  is  adjacent.  A  bipartite  graph  is  an  undirected 
graph  G  =  (V.  E)  in  which  V  can  be  partitioned  into  two  sets  V\  and  V2  such  that 
(u.  v)  e  E  implies  either  u  €  V \  and  v  €  V2  or  u  €  V2  and  v  s  V\.  That  is,  all 
edges  go  between  the  two  sets  V\  and  V2.  An  acyclic,  undirected  graph  is  a  forest, 
and  a  connected,  acyclic,  undirected  graph  is  a  (free)  tree  (see  Section  B.5).  We 
often  take  the  first  letters  of  “directed  acyclic  graph”  and  call  such  a  graph  a  dag. 

There  are  two  variants  of  graphs  that  you  may  occasionally  encounter.  A  multi¬ 
graph  is  like  an  undirected  graph,  but  it  can  have  both  multiple  edges  between  ver¬ 
tices  and  self-loops.  A  hypergraph  is  like  an  undirected  graph,  but  each  hyperedge, 
rather  than  connecting  two  vertices,  connects  an  arbitrary  subset  of  vertices.  Many 
algorithms  written  for  ordinary  directed  and  undirected  graphs  can  be  adapted  to 
run  on  these  graphlike  structures. 

The  contraction  of  an  undirected  graph  G  =  (V,  E)  by  an  edge  e  =  (u,  v)  is  a 
graph  G'  =  (V ,  E'),  where  V'  =  V  —  { u ,  v }  U  {x}  and  x  is  a  new  vertex.  The  set 
of  edges  E'  is  formed  from  E  by  deleting  the  edge  (w,  v)  and,  for  each  vertex  w 
incident  on  u  or  v,  deleting  whichever  of  ( u ,  w)  and  (v,  w )  is  in  E  and  adding  the 
new  edge  (x,  w).  In  effect,  u  and  v  are  “contracted”  into  a  single  vertex. 

Exercises 


B.4-1 

Attendees  of  a  faculty  party  shake  hands  to  greet  each  other,  and  each  professor 
remembers  how  many  times  he  or  she  shook  hands.  At  the  end  of  the  party,  the 
department  head  adds  up  the  number  of  times  that  each  professor  shook  hands. 
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Show  that  the  result  is  even  by  proving  the  handshaking  lemma :  if  G  =  ( V ,  E)  is 
an  undirected  graph,  then 

degree (v)  —  2\E\  . 

veV 


B.4-2 

Show  that  if  a  directed  or  undirected  graph  contains  a  path  between  two  vertices  u 
and  v,  then  it  contains  a  simple  path  between  n  and  v.  Show  that  if  a  directed  graph 
contains  a  cycle,  then  it  contains  a  simple  cycle. 


B.4-3 

Show  that  any  connected,  undirected  graph  G  =  (V,  E)  satisfies  If?!  >  \  V\  —  1. 


B.4-4 

Verify  that  in  an  undirected  graph,  the  “is  reachable  from”  relation  is  an  equiv¬ 
alence  relation  on  the  vertices  of  the  graph.  Which  of  the  three  properties  of  an 
equivalence  relation  hold  in  general  for  the  “is  reachable  from”  relation  on  the 
vertices  of  a  directed  graph? 


B.4-5 

What  is  the  undirected  version  of  the  directed  graph  in  Figure  B.2(a)?  What  is  the 
directed  version  of  the  undirected  graph  in  Figure  B.2(b)? 

B.4-6  * 

Show  that  we  can  represent  a  hypergraph  by  a  bipartite  graph  if  we  let  incidence  in 
the  hypergraph  correspond  to  adjacency  in  the  bipartite  graph.  {Hint:  Let  one  set 
of  vertices  in  the  bipartite  graph  correspond  to  vertices  of  the  hypergraph,  and  let 
the  other  set  of  vertices  of  the  bipartite  graph  correspond  to  hyperedges.) 


B.5  Trees 


As  with  graphs,  there  are  many  related,  but  slightly  different,  notions  of  trees.  This 
section  presents  definitions  and  mathematical  properties  of  several  kinds  of  trees. 
Sections  10.4  and  22.1  describe  how  we  can  represent  trees  in  computer  memory. 

B.5.1  Free  trees 

As  defined  in  Section  B.4,  a  free  tree  is  a  connected,  acyclic,  undirected  graph.  We 
often  omit  the  adjective  “free”  when  we  say  that  a  graph  is  a  tree.  If  an  undirected 
graph  is  acyclic  but  possibly  disconnected,  it  is  a  forest.  Many  algorithms  that  work 
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Figure  B.4  (a)  A  free  free,  (b)  A  forest,  (c)  A  graph  that  contains  a  cycle  and  is  therefore  neither 
a  tree  nor  a  forest. 

for  trees  also  work  for  forests.  Figure  B.4(a)  shows  a  free  tree,  and  Figure  B.4(b) 
shows  a  forest.  The  forest  in  Figure  B.4(b)  is  not  a  tree  because  it  is  not  connected. 
The  graph  in  Figure  B.4(c)  is  connected  but  neither  a  tree  nor  a  forest,  because  it 
contains  a  cycle. 

The  following  theorem  captures  many  important  facts  about  free  trees. 

Theorem  B.2  (Properties  of  free  trees) 

Let  G  =  (V,  E)  be  an  undirected  graph.  The  following  statements  are  equivalent. 

1.  G  is  a  free  tree. 

2.  Any  two  vertices  in  G  are  connected  by  a  unique  simple  path. 

3.  G  is  connected,  but  if  any  edge  is  removed  from  E,  the  resulting  graph  is  dis¬ 
connected. 

4.  G  is  connected,  and  |£|  =  |F|  —  1. 

5.  G  is  acyclic,  and  \E\  =  |F|  —  1. 

6.  G  is  acyclic,  but  if  any  edge  is  added  to  E,  the  resulting  graph  contains  a  cycle. 


Proof  (1)  =>  (2):  Since  a  tree  is  connected,  any  two  vertices  in  G  are  connected 
by  at  least  one  simple  path.  Suppose,  for  the  sake  of  contradiction,  that  vertices  u 
and  v  are  connected  by  two  distinct  simple  paths  p\  and  p2,  as  shown  in  Figure  B.5. 
Let  w  be  the  vertex  at  which  the  paths  first  diverge;  that  is,  w  is  the  first  vertex 
on  both  pi  and  p2  whose  successor  on  px  is  x  and  whose  successor  on  p2  is  y, 
where  i  /  y  Let  z  be  the  first  vertex  at  which  the  paths  reconverge;  that  is,  z  is 
the  first  vertex  following  w  on  p\  that  is  also  on  p2.  Let  p'  be  the  subpath  of  p\ 
from  w  through  x  to  z,  and  let  p"  be  the  subpath  of  p2  from  w  through  y  to  z. 
Paths  p'  and  p"  share  no  vertices  except  their  endpoints.  Thus,  the  path  obtained  by 
concatenating  p'  and  the  reverse  of  p"  is  a  cycle,  which  contradicts  our  assumption 
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P' 


Figure  B.5  A  step  in  the  proof  of  Theorem  B.2:  if  (1)  G  is  a  free  tree,  then  (2)  any  two  vertices 
in  G  are  connected  by  a  unique  simple  path.  Assume  for  the  sake  of  contradiction  that  vertices  u 
and  v  are  connected  by  two  distinct  simple  paths  p\  and  pi-  These  paths  first  diverge  at  vertex  w, 
and  they  first  reconverge  at  vertex  z.  The  path  p'  concatenated  with  the  reverse  of  the  path  p”  forms 
a  cycle,  which  yields  the  contradiction. 

that  G  is  a  tree.  Thus,  if  G  is  a  tree,  there  can  be  at  most  one  simple  path  between 
two  vertices. 

(2)  =>■  (3):  If  any  two  vertices  in  G  are  connected  by  a  unique  simple  path, 
then  G  is  connected.  Let  (w,  v)  be  any  edge  in  E.  This  edge  is  a  path  from  u  to  v, 
and  so  it  must  be  the  unique  path  from  u  to  v.  If  we  remove  ( w ,  v)  from  G,  there 
is  no  path  from  u  to  v,  and  hence  its  removal  disconnects  G. 

(3)  =*►  (4):  By  assumption,  the  graph  G  is  connected,  and  by  Exercise  B.4-3,  we 
have  \E\  >  |L|  —  1.  We  shall  prove  \E\  <  |L|  —  1  by  induction.  A  connected 
graph  with  n  =  1  or  n  =  2  vertices  has  n  —  1  edges.  Suppose  that  G  has  n  >  3 
vertices  and  that  all  graphs  satisfying  (3)  with  fewer  than  n  vertices  also  satisfy 
|£j  <  |  K |  —  1.  Removing  an  arbitrary  edge  from  G  separates  the  graph  into  k  >2 
connected  components  (actually  k  =  2).  Each  component  satisfies  (3),  or  else  G 
would  not  satisfy  (3).  If  we  view  each  connected  component  Vi,  with  edge  set  £,, 
as  its  own  free  tree,  then  because  each  component  has  fewer  than  |  V  |  vertices,  by 
the  inductive  hypothesis  we  have  \Ej  \  <  |  Vt  |  —  1.  Thus,  the  number  of  edges  in  all 
components  combined  is  at  most  \  V\  —  k  <  \V\  —2.  Adding  in  the  removed  edge 
yields  \E\  <  \V\-  1. 

(4)  =>  (5):  Suppose  that  G  is  connected  and  that  |£j  =  |F|  —  1.  We  must  show 
that  G  is  acyclic.  Suppose  that  G  has  a  cycle  containing  k  vertices  Vi,  v2, . . . ,  v*, 
and  without  loss  of  generality  assume  that  this  cycle  is  simple.  Let  G*  =  (L*,  £*) 
be  the  subgraph  of  G  consisting  of  the  cycle.  Note  that  |14|  =  \Ek\  =  k. 
If  A:  <  |  K|,  there  must  be  a  vertex  v*+i  €  V  —  14  that  is  adjacent  to  some  ver¬ 
tex  Vi  e  Vk,  since  G  is  connected.  Define  G*+i  =  (Vjt+i,  Ek+i)  to  be  the  sub¬ 
graph  of  G  with  Vk+\  =  Vk  U  {vi+i}  and  Ek+i  =  Ek  U  {(u,-,  Vjt+i)}.  Note  that 
|l4+i|  =  l-Efc+il  =  k  +  1.  If  k  +  1  <  \V\,  we  can  continue,  defining  G*+2  in 
the  same  manner,  and  so  forth,  until  we  obtain  G„  =  (L„,  En),  where  n  =  |  Vj, 
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Vn  =  V,  and  \En\  =  \Vn\  =  \V\.  Since  G„  is  a  subgraph  of  G,  we  have  En  C  £, 
and  hence  |  £j  >  \  V\,  which  contradicts  the  assumption  that  \E\  =  \  V\  —  1.  Thus, 
G  is  acyclic. 

(5)  =>■  (6):  Suppose  that  G  is  acyclic  and  that  |£j  =  |  F|  —  1.  Let  k  be  the 
number  of  connected  components  of  G.  Each  connected  component  is  a  free  tree 
by  definition,  and  since  (1)  implies  (5),  the  sum  of  all  edges  in  all  connected  com¬ 
ponents  of  G  is  \V\-k.  Consequently,  we  must  have  k  =  1,  and  G  is  in  fact  a 
tree.  Since  (1)  implies  (2),  any  two  vertices  in  G  are  connected  by  a  unique  simple 
path.  Thus,  adding  any  edge  to  G  creates  a  cycle. 

(6)  =>■  (1):  Suppose  that  G  is  acyclic  but  that  adding  any  edge  to  E  creates  a 
cycle.  We  must  show  that  G  is  connected.  Let  u  and  v  be  arbitrary  vertices  in  G. 
If  u  and  v  are  not  already  adjacent,  adding  the  edge  (u,  v)  creates  a  cycle  in  which 
all  edges  but  ( u ,  v)  belong  to  G.  Thus,  the  cycle  minus  edge  ( u ,  v)  must  contain  a 
path  from  u  to  v,  and  since  u  and  v  were  chosen  arbitrarily,  G  is  connected.  ■ 


B.5.2  Rooted  and  ordered  trees 

A  rooted  tree  is  a  free  tree  in  which  one  of  the  vertices  is  distinguished  from  the 
others.  We  call  the  distinguished  vertex  the  root  of  the  tree.  We  often  refer  to  a 
vertex  of  a  rooted  tree  as  a  node5  of  the  tree.  Figure  B.6(a)  shows  a  rooted  tree  on 
a  set  of  12  nodes  with  root  7. 

Consider  a  node  x  in  a  rooted  tree  T  with  root  r.  We  call  any  node  y  on  the 
unique  simple  path  from  r  to  x  an  ancestor  of  x.  If  y  is  an  ancestor  of  x,  then  x  is 
a  descendant  of  y.  (Every  node  is  both  an  ancestor  and  a  descendant  of  itself.)  If  y 
is  an  ancestor  of  x  and  j c  /  j,  then  y  is  a  proper  ancestor  of  x  and  x  is  a  proper 
descendant  of  y.  The  subtree  rooted  at  x  is  the  tree  induced  by  descendants  of  x, 
rooted  at  x.  For  example,  the  subtree  rooted  at  node  8  in  Figure  B.6(a)  contains 
nodes  8,  6,  5,  and  9. 

If  the  last  edge  on  the  simple  path  from  the  root  r  of  a  tree  T  to  a  node  x  is  (y,  x), 
then  y  is  the  parent  of  x,  and  x  is  a  child  of  y.  The  root  is  the  only  node  in  T  with 
no  parent.  If  two  nodes  have  the  same  parent,  they  are  siblings.  A  node  with  no 
children  is  a  leaf  or  external  node.  A  nonleaf  node  is  an  internal  node. 


5The  term  “node”  is  often  used  in  the  graph  theory  literature  as  a  synonym  for  “vertex.”  We  reserve 
the  term  “node”  to  mean  a  vertex  of  a  rooted  tree. 
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Figure  B.6  Rooted  and  ordered  trees,  (a)  A  rooted  tree  with  height  4.  The  tree  is  drawn  in  a 
standard  way:  the  root  (node  7)  is  at  the  top,  its  children  (nodes  with  depth  1)  are  beneath  it,  their 
children  (nodes  with  depth  2)  are  beneath  them,  and  so  forth.  If  the  tree  is  ordered,  the  relative  left 
to  right  order  of  the  children  of  a  node  matters;  otherwise  it  doesn’t,  (b)  Another  rooted  tree.  As  a 
rooted  tree,  it  is  identical  to  the  tree  in  (a),  but  as  an  ordered  tree  it  is  different,  since  the  children  of 
node  3  appear  in  a  different  order. 

The  number  of  children  of  a  node  x  in  a  rooted  tree  T  equals  the  degree  of  x.6 
The  length  of  the  simple  path  from  the  root  r  to  a  node  x  is  the  depth  of  x  in  T . 
A  level  of  a  tree  consists  of  all  nodes  at  the  same  depth.  The  height  of  a  node  in  a 
tree  is  the  number  of  edges  on  the  longest  simple  downward  path  from  the  node  to 
a  leaf,  and  the  height  of  a  tree  is  the  height  of  its  root.  The  height  of  a  tree  is  also 
equal  to  the  largest  depth  of  any  node  in  the  tree. 

An  ordered  tree  is  a  rooted  tree  in  which  the  children  of  each  node  are  ordered. 
That  is,  if  a  node  has  k  children,  then  there  is  a  first  child,  a  second  child,  . . . , 
and  a  /rth  child.  The  two  trees  in  Figure  B.6  are  different  when  considered  to  be 
ordered  trees,  but  the  same  when  considered  to  be  just  rooted  trees. 

B.5.3  Binary  and  positional  trees 

We  define  binary  trees  recursively.  A  binary  tree  T  is  a  structure  defined  on  a  finite 
set  of  nodes  that  either 

•  contains  no  nodes,  or 


6Notice  that  the  degree  of  a  node  depends  on  whether  we  consider  T  to  be  a  rooted  tree  or  a  free  tree. 
The  degree  of  a  vertex  in  a  free  tree  is,  as  in  any  undirected  graph,  the  number  of  adjacent  vertices. 
In  a  rooted  tree,  however,  the  degree  is  the  number  of  children  the  parent  of  a  node  does  not  count 
toward  its  degree. 
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Figure  B.7  Binary  trees,  (a)  A  binary  tree  drawn  in  a  standard  way.  The  left  child  of  a  node  is 
drawn  beneath  the  node  and  to  the  left.  The  right  child  is  drawn  beneath  and  to  the  right,  (b)  A  binary 
tree  different  from  the  one  in  (a).  In  (a),  the  left  child  of  node  7  is  5  and  the  right  child  is  absent. 
In  (b),  the  left  child  of  node  7  is  absent  and  the  right  child  is  5.  As  ordered  trees,  these  trees  are 
the  same,  but  as  binary  trees,  they  are  distinct,  (c)  The  binary  tree  in  (a)  represented  by  the  internal 
nodes  of  a  full  binary  tree:  an  ordered  tree  in  which  each  internal  node  has  degree  2.  The  leaves  in 
the  tree  are  shown  as  squares. 

•  is  composed  of  three  disjoint  sets  of  nodes:  a  root  node,  a  binary  tree  called  its 
left  subtree,  and  a  binary  tree  called  its  right  subtree. 

The  binary  tree  that  contains  no  nodes  is  called  the  empty  tree  or  null  tree,  some¬ 
times  denoted  NIL.  If  the  left  subtree  is  nonempty,  its  root  is  called  the  left  child  of 
the  root  of  the  entire  tree.  Likewise,  the  root  of  a  nonnull  right  subtree  is  the  right 
child  of  the  root  of  the  entire  tree.  If  a  subtree  is  the  null  tree  nil,  we  say  that  the 
child  is  absent  or  missing.  Figure  B.7(a)  shows  a  binary  tree. 

A  binary  tree  is  not  simply  an  ordered  tree  in  which  each  node  has  degree  at 
most  2.  For  example,  in  a  binary  tree,  if  a  node  has  just  one  child,  the  position 
of  the  child— whether  it  is  the  left  child  or  the  right  child— matters.  In  an  or¬ 
dered  tree,  there  is  no  distinguishing  a  sole  child  as  being  either  left  or  right.  Fig¬ 
ure  B.7(b)  shows  a  binary  tree  that  differs  from  the  tree  in  Figure  B.7(a)  because  of 
the  position  of  one  node.  Considered  as  ordered  trees,  however,  the  two  trees  are 
identical. 

We  can  represent  the  positioning  information  in  a  binary  tree  by  the  internal 
nodes  of  an  ordered  tree,  as  shown  in  Figure  B.7(c).  The  idea  is  to  replace  each 
missing  child  in  the  binary  tree  with  a  node  having  no  children.  These  leaf  nodes 
are  drawn  as  squares  in  the  figure.  The  tree  that  results  is  a  full  binary  tree :  each 
node  is  either  a  leaf  or  has  degree  exactly  2.  There  are  no  degree- 1  nodes.  Conse¬ 
quently,  the  order  of  the  children  of  a  node  preserves  the  position  information. 

We  can  extend  the  positioning  information  that  distinguishes  binary  trees  from 
ordered  trees  to  trees  with  more  than  2  children  per  node.  In  a  positional  tree,  the 
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Figure  B.8  A  complete  binary  tree  of  height  3  with  8  leaves  and  7  internal  nodes. 

children  of  a  node  are  labeled  with  distinct  positive  integers.  The  ith  child  of  a 
node  is  absent  if  no  child  is  labeled  with  integer  i.  A  k  -ary  tree  is  a  positional  tree 
in  which  for  every  node,  all  children  with  labels  greater  than  k  are  missing.  Thus, 
a  binary  tree  is  a  k- ary  tree  with  k  =  2. 

A  complete  k-ary  tree  is  a  k- ary  tree  in  which  all  leaves  have  the  same  depth 
and  all  internal  nodes  have  degree  k.  Figure  B.8  shows  a  complete  binary  tree  of 
height  3.  How  many  leaves  does  a  complete  k- ary  tree  of  height  h  have?  The  root 
has  k  children  at  depth  1,  each  of  which  has  k  children  at  depth  2,  etc.  Thus,  the 
number  of  leaves  at  depth  h  is  kh.  Consequently,  the  height  of  a  complete  A: -ary 
tree  with  n  leaves  is  log^  n.  The  number  of  internal  nodes  of  a  complete  A -ary  tree 
of  height  h  is 

h- 1 

i+k+k*+~.+kt-'  =  y>' 

1=0 

kh-  1 
k  -  1 

by  equation  (A.5).  Thus,  a  complete  binary  tree  has  2h  —  1  internal  nodes. 
Exercises 


B.5-1 

Draw  all  the  free  trees  composed  of  the  three  vertices  x,  y,  and  z.  Draw  all  the 
rooted  trees  with  nodes  x,  y,  and  z  with  x  as  the  root.  Draw  all  the  ordered  trees 
with  nodes  x,  y,  and  z  with  x  as  the  root.  Draw  all  the  binary  trees  with  nodes  x, 
y,  and  z  with  x  as  the  root. 
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B.5-2 

Let  G  =  (V,  E)  be  a  directed  acyclic  graph  in  which  there  is  a  vertex  v0  £  V 
such  that  there  exists  a  unique  path  from  v0  to  every  vertex  v  e  V.  Prove  that  the 
undirected  version  of  G  forms  a  tree. 


B.5-3 

Show  by  induction  that  the  number  of  degree-2  nodes  in  any  nonempty  binary  tree 
is  1  fewer  than  the  number  of  leaves.  Conclude  that  the  number  of  internal  nodes 
in  a  full  binary  tree  is  1  fewer  than  the  number  of  leaves. 


B.5-4 

Use  induction  to  show  that  a  nonempty  binary  tree  with  n  nodes  has  height  at 
least  |_lg  /7  J . 

B.5-5  * 

The  internal  path  length  of  a  full  binary  tree  is  the  sum,  taken  over  all  internal 
nodes  of  the  tree,  of  the  depth  of  each  node.  Likewise,  the  external  path  length  is 
the  sum,  taken  over  all  leaves  of  the  tree,  of  the  depth  of  each  leaf.  Consider  a  full 
binary  tree  with  n  internal  nodes,  internal  path  length  i,  and  external  path  length  e. 
Prove  that  e  =  i  +  2n . 

B.5-6  * 

Let  us  associate  a  “weight”  w(x)  =  2~d  with  each  leaf  x  of  depth  d  in  a  binary 
tree  T ,  and  let  L  be  the  set  of  leaves  of  T.  Prove  that  J2xeL  v'(x  )  —  1-  (This  is 
known  as  the  Kraft  inequality.) 

B.5-7  * 

Show  that  if  L  >  2,  then  every  binary  tree  with  L  leaves  contains  a  subtree  having 
between  L/3  and  2L/3  leaves,  inclusive. 


Problems 


B-l  Graph  coloring 

Given  an  undirected  graph  G  =  (V,  E),  a  k -coloring  of  G  is  a  function  c  :  U  — > 
{0,  1, . . . ,  k  —  1}  such  that  c(u )  c(v)  for  every  edge  (u,  v)  e  E.  In  other  words, 
the  numbers  0, 1, . . . ,  k  —  1  represent  the  k  colors,  and  adjacent  vertices  must  have 
different  colors. 


a.  Show  that  any  tree  is  2-colorable. 
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b.  Show  that  the  following  are  equivalent: 

1.  G  is  bipartite. 

2.  G  is  2-colorable. 

3.  G  has  no  cycles  of  odd  length. 

c.  Let  d  be  the  maximum  degree  of  any  vertex  in  a  graph  G.  Prove  that  we  can 
color  G  with  d  +  1  colors. 

d.  Show  that  if  G  has  0(\  V\)  edges,  then  we  can  color  G  with  0(yJ\V\)  colors. 
B-2  Friendly  graphs 

Reword  each  of  the  following  statements  as  a  theorem  about  undirected  graphs, 
and  then  prove  it.  Assume  that  friendship  is  symmetric  but  not  reflexive. 

a.  Any  group  of  at  least  two  people  contains  at  least  two  people  with  the  same 
number  of  friends  in  the  group. 

b.  Every  group  of  six  people  contains  either  at  least  three  mutual  friends  or  at  least 
three  mutual  strangers. 

c.  Any  group  of  people  can  be  partitioned  into  two  subgroups  such  that  at  least 
half  the  friends  of  each  person  belong  to  the  subgroup  of  which  that  person  is 
not  a  member. 

d.  If  everyone  in  a  group  is  the  friend  of  at  least  half  the  people  in  the  group,  then 
the  group  can  be  seated  around  a  table  in  such  a  way  that  everyone  is  seated 
between  two  friends. 

B-3  Bisecting  trees 

Many  divide-and-conquer  algorithms  that  operate  on  graphs  require  that  the  graph 
be  bisected  into  two  nearly  equal-sized  subgraphs,  which  are  induced  by  a  partition 
of  the  vertices.  This  problem  investigates  bisections  of  trees  formed  by  removing  a 
small  number  of  edges.  We  require  that  whenever  two  vertices  end  up  in  the  same 
subtree  after  removing  edges,  then  they  must  be  in  the  same  partition. 

a.  Show  that  we  can  partition  the  vertices  of  any  //-vertex  binary  tree  into  two 
sets  A  and  B,  such  that  \A\  <  3«/4  and  |5|  <  3«/4,  by  removing  a  single 
edge. 

b.  Show  that  the  constant  3/4  in  part  (a)  is  optimal  in  the  worst  case  by  giving 
an  example  of  a  simple  binary  tree  whose  most  evenly  balanced  partition  upon 
removal  of  a  single  edge  has  \A\  =  3/2 / 4. 
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c.  Show  that  by  removing  at  most  0{\gn)  edges,  we  can  partition  the  vertices 
of  any  n -vertex  binary  tree  into  two  sets  A  and  B  such  that  \A\  =  \_n/2\ 
and  |  B  \  =  |  n/2]. 


Appendix  notes 

G.  Boole  pioneered  the  development  of  symbolic  logic,  and  he  introduced  many  of 
the  basic  set  notations  in  a  book  published  in  1 854.  Modern  set  theory  was  created 
by  G.  Cantor  during  the  period  1874-1895.  Cantor  focused  primarily  on  sets  of 
infinite  cardinality.  The  term  “function”  is  attributed  to  G.  W.  Leibniz,  who  used  it 
to  refer  to  several  kinds  of  mathematical  formulas.  His  limited  definition  has  been 
generalized  many  times.  Graph  theory  originated  in  1736,  when  L.  Euler  proved 
that  it  was  impossible  to  cross  each  of  the  seven  bridges  in  the  city  of  Konigsberg 
exactly  once  and  return  to  the  starting  point. 

The  book  by  Harary  [160]  provides  a  useful  compendium  of  many  definitions 
and  results  from  graph  theory. 


c 


Counting  and  Probability 


This  appendix  reviews  elementary  combinatorics  and  probability  theory.  If  you 
have  a  good  background  in  these  areas,  you  may  want  to  skim  the  beginning  of  this 
appendix  lightly  and  concentrate  on  the  later  sections.  Most  of  this  book’s  chapters 
do  not  require  probability,  but  for  some  chapters  it  is  essential. 

Section  C.l  reviews  elementary  results  in  counting  theory,  including  standard 
formulas  for  counting  permutations  and  combinations.  The  axioms  of  probability 
and  basic  facts  concerning  probability  distributions  form  Section  C.2.  Random 
variables  are  introduced  in  Section  C.3,  along  with  the  properties  of  expectation 
and  variance.  Section  C.4  investigates  the  geometric  and  binomial  distributions 
that  arise  from  studying  Bernoulli  trials.  The  study  of  the  binomial  distribution 
continues  in  Section  C.5,  an  advanced  discussion  of  the  “tails”  of  the  distribution. 


C.l  Counting 

Counting  theory  tries  to  answer  the  question  “How  many?”  without  actually  enu¬ 
merating  all  the  choices.  For  example,  we  might  ask,  “How  many  different  n-bit 
numbers  are  there?”  or  “How  many  orderings  of  n  distinct  elements  are  there?”  In 
this  section,  we  review  the  elements  of  counting  theory.  Since  some  of  the  material 
assumes  a  basic  understanding  of  sets,  you  might  wish  to  start  by  reviewing  the 
material  in  Section  B.l. 

Rules  of  sum  and  product 

We  can  sometimes  express  a  set  of  items  that  we  wish  to  count  as  a  union  of  disjoint 
sets  or  as  a  Cartesian  product  of  sets. 

The  rule  of  sum  says  that  the  number  of  ways  to  choose  one  element  from  one 
of  two  disjoint  sets  is  the  sum  of  the  cardinalities  of  the  sets.  That  is,  if  A  and  B 
are  two  finite  sets  with  no  members  in  common,  then  \  A  U  B\  =  \A  \  +  |  fi  | ,  which 
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follows  from  equation  (B.3).  For  example,  each  position  on  a  car’s  license  plate 
is  a  letter  or  a  digit.  The  number  of  possibilities  for  each  position  is  therefore 
26  +  10  =  36,  since  there  are  26  choices  if  it  is  a  letter  and  10  choices  if  it  is  a 
digit. 

The  rule  of  product  says  that  the  number  of  ways  to  choose  an  ordered  pair  is  the 
number  of  ways  to  choose  the  first  element  times  the  number  of  ways  to  choose  the 
second  element.  That  is,  if  A  and  B  are  two  finite  sets,  then  \A  x  B\  =  \A\  •  |fi|, 
which  is  simply  equation  (B.4).  For  example,  if  an  ice-cream  parlor  offers  28 
flavors  of  ice  cream  and  4  toppings,  the  number  of  possible  sundaes  with  one  scoop 
of  ice  cream  and  one  topping  is  28  •  4  =  112. 

Strings 

A  string  over  a  finite  set  S  is  a  sequence  of  elements  of  S.  For  example,  there  are  8 
binary  strings  of  length  3: 

000,001,010,011,  100,  101,  110,  111  . 

We  sometimes  call  a  string  of  length  k  a  k  -string.  A  substring  s'  of  a  string  s 
is  an  ordered  sequence  of  consecutive  elements  of  s.  A  k-substring  of  a  string 
is  a  substring  of  length  k.  For  example,  010  is  a  3-substring  of  01101001  (the 
3-substring  that  begins  in  position  4),  but  1 1 1  is  not  a  substring  of  01 101001 . 

We  can  view  a  /c -string  over  a  set  S  as  an  element  of  the  Cartesian  product  Sk 
of  k -tuples;  thus,  there  are  |S|A  strings  of  length  k.  For  example,  the  number  of 
binary  k -strings  is  2k .  Intuitively,  to  construct  a  /c -string  over  an  n-set,  we  have  n 
ways  to  pick  the  first  element;  for  each  of  these  choices,  we  have  n  ways  to  pick  the 
second  element;  and  so  forth  k  times.  This  construction  leads  to  the  k-fold  product 
«■«■■■«  =  nk  as  the  number  of  k -strings. 

Permutations 

A  permutation  of  a  finite  set  S  is  an  ordered  sequence  of  all  the  elements  of  S, 
with  each  element  appearing  exactly  once.  For  example,  if  S  =  {a,  b,  c},  then  S 
has  6  permutations: 

abc,acb,bac,bca,cab,cba  . 

There  are  n  \  peimutations  of  a  set  of  n  elements,  since  we  can  choose  the  first 
element  of  the  sequence  in  n  ways,  the  second  in  n  —  1  ways,  the  third  van  —  2 
ways,  and  so  on. 

A  k -permutation  of  S  is  an  ordered  sequence  of  k  elements  of  S,  with  no  ele¬ 
ment  appearing  more  than  once  in  the  sequence.  (Thus,  an  ordinary  permutation  is 
an  /; -permutation  of  an  n-set.)  The  twelve  2-permutations  of  the  set  {a,b,  c,d}  are 
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ab,  ac,ad,  ba,  be,  bd,  ca,  cb,  cd,  da,  db,  dc  . 

The  number  of  k -permutations  of  an  //-set  is 

n(n  -  1)0  -  2)  •  •  •  0  - k  +  1)  =  .  (C.l) 

0  —  k)\ 

since  we  have  n  ways  to  choose  the  first  element,  n  —  I  ways  to  choose  the  second 
element,  and  so  on,  until  we  have  selected  k  elements,  the  last  being  a  selection 
from  the  remaining  n  —  k  +  1  elements. 


Combinations 


A  k -combination  of  an  /j -set  S  is  simply  a  k -subset  of  S .  For  example,  the  4-set 
{a,b,  c,  d)  has  six  2-combinations: 

ab,ac,ad,bc,bd,cd  . 


(Here  we  use  the  shorthand  of  denoting  the  2-subset  {a,b}  by  ab,  and  so  on.) 
We  can  construct  a  ^-combination  of  an  /7-set  by  choosing  k  distinct  (different) 
elements  from  the  //-set.  The  order  in  which  we  select  the  elements  does  not  matter. 

We  can  express  the  number  of  & -combinations  of  an  «-set  in  terms  of  the  number 
of  & -permutations  of  an  //-set.  Every  //-combination  has  exactly  k\  permutations 
of  its  elements,  each  of  which  is  a  distinct  //-permutation  of  the  /7-set.  Thus,  the 
number  of  //-combinations  of  an  /7-set  is  the  number  of  //-permutations  divided 
by  //!;  from  equation  (C.l),  this  quantity  is 


/?! 

Zz!  (n  —  k)\ 


(C.2) 


For  k  =  0,  this  formula  tells  us  that  the  number  of  ways  to  choose  0  elements  from 
an  //-set  is  1  (not  0),  since  0!  =  1. 


Binomial  coefficients 

The  notation  (J 1 )  (read  “//  choose  /z”)  denotes  the  number  of  //-combinations  of 
an  //-set.  From  equation  (C.2),  we  have 

(/7  \  n ! 

kj  k\(n  —  k)\ 

This  formula  is  symmetric  in  k  and  n  —  k: 


(C.3) 
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These  numbers  are  also  known  as  binomial  coefficients,  due  to  their  appearance  in 
the  binomial  expansion  -. 

(x  +  y)n  =  it(^jxkyn-k-  (c.4) 

A  special  case  of  the  binomial  expansion  occurs  when  x  =  y  =  1 : 


This  formula  corresponds  to  counting  the  2"  binary  n -strings  by  the  number  of  Is 
they  contain:  ("k)  binary  n -strings  contain  exactly  k  Is,  since  we  have  (".)  ways  to 
choose  k  out  of  the  n  positions  in  which  to  place  the  Is. 

Many  identities  involve  binomial  coefficients.  The  exercises  at  the  end  of  this 
section  give  you  the  opportunity  to  prove  a  few. 


Binomial  bounds 


We  sometimes  need  to  bound  the  size  of  a  binomial  coefficient.  For  1  <  k  <  n, 
we  have  the  lower  bound 


^7?^  n(n  —  1)  •  •  •  (n  —  k  +  1) 
=  k(k  —  1)  ■  •  •  1 


Taking  advantage  of  the  inequality  k\  >  (k /e)k  derived  from  Stirling’s  approxi¬ 
mation  (3.18),  we  obtain  the  upper  bounds 


n\  n(n  —  1)  ■  ■  ■  (n  —  k  +  1) 
k)  =  k(k  -!)•••  1 


<  — 

~  k\ 


(C.5) 


For  all  integers  k  such  that  0  <  k  <  n,  we  can  use  induction  (see  Exercise  C.l-12) 
to  prove  the  bound 
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<  -  , 

kk(n  —  k)n~k 


(C-6) 


where  for  convenience  we  assume  that  0°  =  1.  For  k  =  Xn,  where  0  <  A  <  1,  we 
can  rewrite  this  bound  as 


2 n  H( A) 


where 

H(X)  =  -A IgA  -  (1  -  A) lg(  1  —  A)  (C.7) 

is  the  (binary)  entropy  function  and  where,  for  convenience,  we  assume  that 
OlgO  =  0,  so  that  H( 0)  =  H(  1)  =  0. 


Exercises 

C.l-1 

How  many  ^-substrings  does  an  n -string  have?  (Consider  identical  ^-substrings  at 
different  positions  to  be  different.)  How  many  substrings  does  an  n -string  have  in 
total? 


C.l -2 

An  /?-input,  m-output  boolean  function  is  a  function  from  {TRUE,  FALSE}"  to 
{true,  FALSE }m .  How  many  n-input,  1-output  boolean  functions  are  there?  How 
many  » -input,  /n-output  boolean  functions  are  there? 


C.l -3 

In  how  many  ways  can  n  professors  sit  around  a  circular  conference  table?  Con¬ 
sider  two  searings  to  be  the  same  if  one  can  be  rotated  to  form  the  other. 


C.l -4 

In  how  many  ways  can  we  choose  three  distinct  numbers  from  the  set  { 1 , 2, ... ,  99} 
so  that  their  sum  is  even? 


1188 


Appendix  C  Counting  and  Probability 


C.l-5 

Prove  the  identity 


for  0  <  k  <  n. 


C.l-6 

Prove  the  identity 


for  0  <  k  <  n. 


(C.8) 


C.l-7 

To  choose  k  objects  from  n,  you  can  make  one  of  the  objects  distinguished  and 
consider  whether  the  distinguished  object  is  chosen.  Use  this  approach  to  prove 
that 


C.l-8 

Using  the  result  of  Exercise  C.l-7,  make  a  table  for  n  =  0,  1, ....  6  and  0  <  k  <  n 
of  the  binomial  coefficients  (£)  with  Q  at  the  top,  (,' )  and  (j)  on  the  next  line,  and 
so  forth.  Such  a  table  of  binomial  coefficients  is  called  Pascal’s  triangle. 


C.l-9 

Prove  that 


E-  = 


C.l-10 

Show  that  for  any  integers  n  >  0  and  0  <  k  <  n,  the  expression  (jj)  achieves  its 
maximum  value  when  k  =  \n/2\  or  k  =  |7t/2j. 

C.l-11  * 

Argue  that  for  any  integers  n  >  0,  j  >  0,  k  >  0,  and  j  +  k  <  n. 
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Provide  both  an  algebraic  proof  and  an  argument  based  on  a  method  for  choosing 
j  +  k  items  out  of  n .  Give  an  example  in  which  equality  does  not  hold. 

C.l-12  * 

Use  induction  on  all  integers  k  such  that  0  <  k  <  n/2  to  prove  inequality  (C.6), 
and  use  equation  (C.3)  to  extend  it  to  all  integers  k  such  that  0  <  k  <  n. 

C.l-13  * 

Use  Stirling’s  approximation  to  prove  that 


(C.10) 


C.l-14  * 

By  differentiating  the  entropy  function  H( A),  show  that  it  achieves  its  maximum 
value  at  A  =  1/2.  What  is  H(  1/2)? 

C.l-15  * 

Show  that  for  any  integer  n  >  0, 


(C.ll) 


C.2  Probability 


Probability  is  an  essential  tool  for  the  design  and  analysis  of  probabilistic  and  ran¬ 
domized  algorithms.  This  section  reviews  basic  probability  theory. 

We  define  probability  in  terms  of  a  sample  space  S,  which  is  a  set  whose  ele¬ 
ments  are  called  elementary  events.  We  can  think  of  each  elementary  event  as  a 
possible  outcome  of  an  experiment.  For  the  experiment  of  flipping  two  distinguish¬ 
able  coins,  with  each  individual  flip  resulting  in  a  head  (H)  or  a  tail  (T),  we  can  view 
the  sample  space  as  consisting  of  the  set  of  all  possible  2-strings  over  {h,  t}: 


S  = {hh,ht,th,tt} . 
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An  event  is  a  subset1  of  the  sample  space  S.  For  example,  in  the  experiment  of 
flipping  two  coins,  the  event  of  obtaining  one  head  and  one  tail  is  {ht,th}.  The 
event  S  is  called  the  certain  event ,  and  the  event  0  is  called  the  null  event.  We  say 
that  two  events  A  and  B  are  mutually  exclusive  if  A  D  B  =  0.  We  sometimes  treat 
an  elementary  event  s  €  S  as  the  event  {5}.  By  definition,  all  elementary  events 
are  mutually  exclusive. 

Axioms  of  probability 

A  probability  distribution  Pr  {}  on  a  sample  space  S  is  a  mapping  from  events  of  S 
to  real  numbers  satisfying  the  following  probability  axioms : 


1 .  Pr  { A}  >  0  for  any  event  A. 

2.  Pr  {S}  =  1. 


3.  Pr{d  U  5}  =  Pr{A}  +  Pr{5}  for  any  two  mutually  exclusive  events  A 
and  B.  More  generally,  for  any  (finite  or  countably  infinite)  sequence  of  events 
Ai,  A2, . .  ■  that  are  pairwise  mutually  exclusive, 


We  call  Pr  {A}  the  probability  of  the  event  A.  We  note  here  that  axiom  2  is  a 
normalization  requirement:  there  is  really  nothing  fundamental  about  choosing  1 
as  the  probability  of  the  certain  event,  except  that  it  is  natural  and  convenient. 

Several  results  follow  immediately  from  these  axioms  and  basic  set  theory  (see 
Section  B.l).  The  null  event  0  has  probability  Pr{0}  =  0.  If  A  C  B,  then 
Pr  { A  |  <  Pr  ( B  |  .  Using  A  to  denote  the  event  S  —  A  (the  complement  of  A), 
we  have  Pr  {A}  =  1  —  Pr  {A}.  For  any  two  events  A  and  B, 


Pr  {A  U  B}  =  Pr{A}  +  Pr{B}  —  Pr{A  fl  B} 
<  Pr{A}  +  Pr{5}. 


(C.12) 

(C.13) 


For  a  general  probability  distribution,  there  may  be  some  subsets  of  the  sample  space  S  that  are  not 


considered  to  be  events.  This  situation  usually  arises  when  the  sample  space  is  uncountably  infinite. 
The  main  requirement  for  what  subsets  are  events  is  that  the  set  of  events  of  a  sample  space  be  closed 
under  the  operations  of  taking  the  complement  of  an  event,  forming  the  union  of  a  finite  or  countable 
number  of  events,  and  taking  the  intersection  of  a  finite  or  countable  number  of  events.  Most  of 
the  probability  distributions  we  shall  see  are  over  finite  or  countable  sample  spaces,  and  we  shall 
generally  consider  all  subsets  of  a  sample  space  to  be  events.  A  notable  exception  is  the  continuous 
uniform  probability  distribution,  which  we  shall  see  shortly. 
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In  our  coin-flipping  example,  suppose  that  each  of  the  four  elementary  events 
has  probability  1/4.  Then  the  probability  of  getting  at  least  one  head  is 

Pr  {hh,  ht,  th}  =  Pr{HH}  +  Pr{HT}  +  Pr{TH} 

=  3/4. 

Alternatively,  since  the  probability  of  getting  strictly  less  than  one  head  is 
Pr  {tt}  =1/4,  the  probability  of  getting  at  least  one  head  is  1  —  1/4  =  3/4. 

Discrete  probability  distributions 

A  probability  distribution  is  discrete  if  it  is  defined  over  a  finite  or  countably  infinite 
sample  space.  Let  S  be  the  sample  space.  Then  for  any  event  A, 

Pr{A}  =  £Pr{s}  , 

JS.4 

since  elementary  events,  specifically  those  in  A,  are  mutually  exclusive.  If  S  is 
finite  and  every  elementary  event  s  €  S  has  probability 

Pr  W  =  1/1*51  , 

then  we  have  the  uniform  probability  distribution  on  S.  In  such  a  case  the  experi¬ 
ment  is  often  described  as  “picking  an  element  of  S  at  random.” 

As  an  example,  consider  the  process  of  flipping  a  fair  coin,  one  for  which  the 
probability  of  obtaining  a  head  is  the  same  as  the  probability  of  obtaining  a  tail,  that 
is,  1/2.  If  we  flip  the  coin  n  times,  we  have  the  uniform  probability  distribution 
defined  on  the  sample  space  S  =  {H,  t}",  a  set  of  size  2".  We  can  represent  each 
elementary  event  in  S  as  a  string  of  length  n  over  {H,  t},  each  string  occurring  with 
probability  1/2".  The  event 

A  =  {exactly  k  heads  and  exactly  n  —  k  tails  occur} 

is  a  subset  of  S  of  size  \A\  =  (/),  since  (/)  strings  of  length  n  over  {H,  T}  contain 
exactly  k  H’s.  The  probability  of  event  A  is  thus  Pr  {A}  =  (/)/2". 

Continuous  uniform  probability  distribution 

The  continuous  uniform  probability  distribution  is  an  example  of  a  probability 
distribution  in  which  not  all  subsets  of  the  sample  space  are  considered  to  be 
events.  The  continuous  uniform  probability  distribution  is  defined  over  a  closed 
interval  [a,  b]  of  the  reals,  where  a  <  b.  Our  intuition  is  that  each  point  in  the  in¬ 
terval  [a,b]  should  be  “equally  likely.”  There  are  an  uncountable  number  of  points, 
however,  so  if  we  give  all  points  the  same  finite,  positive  probability,  we  cannot  si¬ 
multaneously  satisfy  axioms  2  and  3.  For  this  reason,  we  would  like  to  associate  a 
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probability  only  with  some  of  the  subsets  of  S,  in  such  a  way  that  the  axioms  are 
satisfied  for  these  events. 

For  any  closed  interval  [c,d],  where  a  <  c  <  d  <  b,  the  continuous  uniform 
probability  distribution  defines  the  probability  of  the  event  [c,d]  to  be 

Pr{[c,  d]}  -  j — -  . 

Note  that  for  any  point  x  =  [jc,jc],  the  probability  of  x  is  0.  If  we  remove 
the  endpoints  of  an  interval  [ c,d] ,  we  obtain  the  open  interval  (c,  d ).  Since 
[c,d]  =  [c,c\  U  ( c,d )  U  [d,d],  axiom  3  gives  us  Pr{[c,c/]}  =  Pr  {{c,d)}.  Gen¬ 
erally,  the  set  of  events  for  the  continuous  uniform  probability  distribution  contains 
any  subset  of  the  sample  space  [a,  b]  that  can  be  obtained  by  a  finite  or  countable 
union  of  open  and  closed  intervals,  as  well  as  certain  more  complicated  sets. 


Conditional  probability  and  independence 


Sometimes  we  have  some  prior  partial  knowledge  about  the  outcome  of  an  exper¬ 
iment.  For  example,  suppose  that  a  friend  has  flipped  two  fair  coins  and  has  told 
you  that  at  least  one  of  the  coins  showed  a  head.  What  is  the  probability  that  both 
coins  are  heads?  The  information  given  eliminates  the  possibility  of  two  tails.  The 
three  remaining  elementary  events  are  equally  likely,  so  we  infer  that  each  occurs 
with  probability  1/3.  Since  only  one  of  these  elementary  events  shows  two  heads, 
the  answer  to  our  question  is  1/3. 

Conditional  probability  formalizes  the  notion  of  having  prior  partial  knowledge 
of  the  outcome  of  an  experiment.  The  conditional  probability  of  an  event  A  given 
that  another  event  B  occurs  is  defined  to  be 


Pr{A  |  B} 


Pr  {^4  n  B} 
Pr  {B} 


(C.  14) 


whenever  Pr  { B  |  f  0.  (We  read  “Pr  { A  \  B}”  as  “the  probability  of  A  given  BA) 
Intuitively,  since  we  are  given  that  event  B  occurs,  the  event  that  A  also  occurs 
is  A  n  B.  That  is,  A  n  B  is  the  set  of  outcomes  in  which  both  A  and  B  occur. 
Because  the  outcome  is  one  of  the  elementary  events  in  B,  we  normalize  the  prob¬ 
abilities  of  all  the  elementary  events  in  B  by  dividing  them  by  Pr  {B},  so  that  they 
sum  to  1.  The  conditional  probability  of  A  given  B  is,  therefore,  the  ratio  of  the 
probability  of  event  A  n  B  to  the  probability  of  event  B.  In  the  example  above,  A 
is  the  event  that  both  coins  are  heads,  and  B  is  the  event  that  at  least  one  coin  is  a 
head.  Thus,  Pr{,4  |  Bj  =  (l/4)/(3/4)  =  1/3. 

Two  events  are  independent  if 


Pr{,4  f!B}  =  Pr{y4}Pr{5}  , 

which  is  equivalent,  if  Pr{5}  f  0,  to  the  condition 


(C.15) 
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Pr  {A  |  B }  =  Pr{A}  . 

For  example,  suppose  that  we  flip  two  fair  coins  and  that  the  outcomes  are  inde¬ 
pendent.  Then  the  probability  of  two  heads  is  (1/2) (1/2)  =  1/4.  Now  suppose 
that  one  event  is  that  the  first  coin  comes  up  heads  and  the  other  event  is  that  the 
coins  come  up  differently.  Each  of  these  events  occurs  with  probability  1/2,  and 
the  probability  that  both  events  occur  is  1/4;  thus,  according  to  the  definition  of 
independence,  the  events  are  independent— even  though  you  might  think  that  both 
events  depend  on  the  first  coin.  Finally,  suppose  that  the  coins  are  welded  to¬ 
gether  so  that  they  both  fall  heads  or  both  fall  tails  and  that  the  two  possibilities  are 
equally  likely.  Then  the  probability  that  each  coin  comes  up  heads  is  1/2,  but  the 
probability  that  they  both  come  up  heads  is  1/2  ^  (1/2) (1/2).  Consequently,  the 
event  that  one  comes  up  heads  and  the  event  that  the  other  comes  up  heads  are  not 
independent. 

A  collection  Au  A2,  ■  ■  ■ ,  A„  of  events  is  said  to  be  pairwise  independent  if 
Pr  {At  FI  Aj)  =  Pr{A,}Pr{Ay} 

for  all  1  <  i  <  j  <  n.  We  say  that  the  events  of  the  collection  are  ( mutually ) 
independent  if  every  A: -subset  An ,  Al2, . . . ,  A,k  of  the  collection,  where  2  <  k  <  n 
and  1  <  i\  <  i2  <  ■  •  •  <  ik  <  n,  satisfies 


Pr  {Ah  n  Ah  n  •  •  •  n  Aik  }  =  Pr  {Ah  }  Pr  { Ah }  •  ■  ■  Pr  { Aik }  . 

For  example,  suppose  we  flip  two  fair  coins.  Fet  A ,  be  the  event  that  the  first  coin 
is  heads,  let  A2  be  the  event  that  the  second  coin  is  heads,  and  let  ,43  be  the  event 
that  the  two  coins  are  different.  We  have 


Pr  {Aj}  =  1/2, 

Pr  {A2}  =  1/2, 

Pr{A3}  =  1/2, 

PrjAxnAa}  =  1/4, 

PriA.HA,}  =  1/4, 

Pr  {A2  n  A3}  =  1/4, 

Pr{AlnA2nA3}  =  0. 

Since  for  1  <  i  <  j  <  3,  we  have  Pr{A,  n  Aj}  =  Pr{A,}Pr{A,}  =  1/4,  the 
events  Alt  A2,  and  ,43  are  pairwise  independent.  The  events  are  not  mutually  inde¬ 
pendent,  however,  because  Pr{A!  n  A2  D  A3}  =  0  and  Pr  { .4 ,  )  Pr  { A 2 (  Pr  { A 3  j  = 
1/8  ^  0. 
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Bayes’s  theorem 


From  the  definition  of  conditional  probability  (C.14)  and  the  commutative  law 
A  n  B  =  B  n  A,  it  follows  that  for  two  events  A  and  B,  each  with  nonzero 
probability, 

Pr  {A  n  B}  =  Pr{5}Pr{yf  |  B)  (C.16) 

=  Pr{A}Pr{fl  |  A}  . 


Solving  for  Pr  {A  \  B},  we  obtain 
PrM}Pr{£  I  A} 

PrM  B}=  — 5—! - - — ! — (C.17) 

1  1  r  Pr{5} 

which  is  known  as  Bayes’s  theorem.  The  denominator  Pr{5}  is  a  normalizing 
constant,  which  we  can  reformulate  as  follows.  Since  B  =  (B  n  A)  U  (5  n  A), 
and  since  find  and  B  H  A  are  mutually  exclusive  events, 


Pr {5}  =  Pr{5  n  A}  +  Pr{5  n  A) 

=  Pr{,4}Pr{5  |  v4}  +  Pr{I}Pr{£  |  1}  . 


Substituting  into  equation  (C.17),  we  obtain  an  equivalent  form  of  Bayes’s  theo¬ 
rem: 


Pr  {A  |  B)  = 


Pr{^}Pr{5  |  A} 

Pr{/4}Pr{5  |  A}  +  Pr  {1}  Pr  {B  |  A}  ' 


(C.18) 


Bayes’s  theorem  can  simplify  the  computing  of  conditional  probabilities.  For 
example,  suppose  that  we  have  a  fair  coin  and  a  biased  coin  that  always  comes  up 
heads.  We  run  an  experiment  consisting  of  three  independent  events:  we  choose 
one  of  the  two  coins  at  random,  we  flip  that  coin  once,  and  then  we  flip  it  again. 
Suppose  that  the  coin  we  have  chosen  comes  up  heads  both  times.  What  is  the 
probability  that  it  is  biased? 

We  solve  this  problem  using  Bayes’s  theorem.  Let  A  be  the  event  that  we  choose 
the  biased  coin,  and  let  B  be  the  event  that  the  chosen  coin  comes  up  heads  both 
times.  We  wish  to  determine  Pr{^  |  B}.  We  have  Pr{^4}  =  1/2,  Pr{5  |  4}  =  1, 
Pr  {^4}  =  1/2,  and  Pr  {B  \  A }  =  1/4;  hence, 


Pr {^4  |  B} 


(1/2)  ■  1 

(1/2)  ■  1  +  (1/2)  •  (1/4) 
4/5. 


Exercises 


C.2-1 

Professor  Rosencrantz  flips  a  fair  coin  once.  Professor  Guildenstern  flips  a  fair 
coin  twice.  What  is  the  probability  that  Professor  Rosencrantz  obtains  more  heads 
than  Professor  Guildenstern? 
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C.2-2 

Prove  Boole’s  inequality.  For  any  finite  or  countably  infinite  sequence  of  events 
A\,  A2, . . ., 

Pr{^i  U  A2  U  ■■■}  <  Pr{A}  +  Pr{A2}  +  .  (C.19) 


C.2-3 

Suppose  we  shuffle  a  deck  of  10  cards,  each  bearing  a  distinct  number  from  1  to  10, 
to  mix  the  cards  thoroughly.  We  then  remove  three  cards,  one  at  a  time,  from  the 
deck.  What  is  the  probability  that  we  select  the  three  cards  in  sorted  (increasing) 
order? 


C.2-4 

Prove  that 

Pi  {A  |  B}  +  Pr{2  |  B}  =  1  . 


C.2 -5 

Prove  that  for  any  collection  of  events  Alt  A2, . . . ,  An, 

Pr  {Ax  n  A2  n  ■■■  n  An }  =  Pr{Ai}-Pr{A2  |  ^4i}  •  Pr  {A3  |  A\  D  A2}  ■  ■  ■ 

Pr {An  |  A\  H  A2  n  ■■■  n  A„_i}  . 


C.2 -6  * 

Describe  a  procedure  that  takes  as  input  two  integers  a  and  b  such  that  0  <  a  <  b 
and,  using  fair  coin  flips,  produces  as  output  heads  with  probability  a/b  and  tails 
with  probability  ( b  —  a)/ b.  Give  a  bound  on  the  expected  number  of  coin  flips, 
which  should  be  0(1).  {Hint:  Represent  a/b  in  binary.) 

C.2-7  * 

Show  how  to  construct  a  set  of  n  events  that  are  pairwise  independent  but  such  that 
no  subset  of  k  >  2  of  them  is  mutually  independent. 

C.2-8  * 

Two  events  A  and  B  are  conditionally  independent ,  given  C ,  if 
Pi  {A  n  B  |  C}  =  Pr  |  C}-Pi{B  \  C}  . 

Give  a  simple  but  nontrivial  example  of  two  events  that  are  not  independent  but  are 
conditionally  independent  given  a  third  event. 

C.2-9  * 

You  are  a  contestant  in  a  game  show  in  which  a  prize  is  hidden  behind  one  of 
three  curtains.  You  will  win  the  prize  if  you  select  the  correct  curtain.  After  you 
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have  picked  one  curtain  but  before  the  curtain  is  lifted,  the  emcee  lifts  one  of  the 
other  curtains,  knowing  that  it  will  reveal  an  empty  stage,  and  asks  if  you  would 
like  to  switch  from  your  current  selection  to  the  remaining  curtain.  How  would 
your  chances  change  if  you  switch?  (This  question  is  the  celebrated  Monty  Hall 
problem ,  named  after  a  game-show  host  who  often  presented  contestants  with  just 
this  dilemma.) 

C.2-10  * 

A  prison  warden  has  randomly  picked  one  prisoner  among  three  to  go  free.  The 
other  two  will  be  executed.  The  guard  knows  which  one  will  go  free  but  is  forbid¬ 
den  to  give  any  prisoner  information  regarding  his  status.  Let  us  call  the  prisoners 
X,  Y,  and  Z.  Prisoner  X  asks  the  guard  privately  which  of  Y  or  Z  will  be  exe¬ 
cuted,  arguing  that  since  he  already  knows  that  at  least  one  of  them  must  die,  the 
guard  won’t  be  revealing  any  information  about  his  own  status.  The  guard  tells  X 
that  Y  is  to  be  executed.  Prisoner  X  feels  happier  now,  since  he  figures  that  either 
he  or  prisoner  Z  will  go  free,  which  means  that  his  probability  of  going  free  is 
now  1/2.  Is  he  right,  or  are  his  chances  still  1/3?  Explain. 


C.3  Discrete  random  variables 

A  (discrete)  random  variable  X  is  a  function  from  a  finite  or  countably  infinite 
sample  space  S  to  the  real  numbers.  It  associates  a  real  number  with  each  possible 
outcome  of  an  experiment,  which  allows  us  to  work  with  the  probability  distribu¬ 
tion  induced  on  the  resulting  set  of  numbers.  Random  variables  can  also  be  defined 
for  uncountably  infinite  sample  spaces,  but  they  raise  technical  issues  that  are  un¬ 
necessary  to  address  for  our  puiposes.  Henceforth,  we  shall  assume  that  random 
variables  are  discrete. 

For  a  random  variable  X  and  a  real  number  x,  we  define  the  event  X  =  x  to  be 
{teS:  A(s)  =  x}\  thus, 

Pr{X  =  x}=  PrW  • 

seS:X(s)=x 

The  function 
f(x)  =  Pr{Z  =  x} 

is  the  probability  density  function  of  the  random  variable  X.  From  the  probability 
axioms,  Pr  {X  =  x)  >  0  and  Ylx  Pf  {X  =  x}  =  1. 

As  an  example,  consider  the  experiment  of  rolling  a  pair  of  ordinary,  6-sided 
dice.  There  are  36  possible  elementary  events  in  the  sample  space.  We  assume 
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that  the  probability  distribution  is  uniform,  so  that  each  elementary  event  s  e  S  is 
equally  likely:  Pr  {5}  =  1/36.  Define  the  random  variable  X  to  be  the  maximum  of 
the  two  values  showing  on  the  dice.  We  have  Pr  {X  =  3}  =  5/36,  since  X  assigns 
a  value  of  3  to  5  of  the  36  possible  elementary  events,  namely,  (1, 3),  (2,  3),  (3,  3), 
(3,2),  and  (3,1). 

We  often  define  several  random  variables  on  the  same  sample  space.  If  X  and  F 
are  random  variables,  the  function 

f(x,y)  =  Pr  {Y  =  x  and  Y  =  y} 

is  the  joint  probability  density  function  of  X  and  Y.  For  a  fixed  value  y, 

Pr  {F  =  y)  =  J]Pr{Y  =  x  and  Y  =  y}  , 


and  similarly,  for  a  fixed  value  x, 

Pr{Y  =  x}  =  J]Pr{Y  =  x  and  Y  =  y}  . 

y 


Using  the  definition  (C.14)  of  conditional  probability,  we  have 


Pr{Y  =  x  \  Y  =  y) 


Pr{Y  =  x  and  Y  =  y} 
Pr  {F  =  y} 


We  define  two  random  variables  X  and  F  to  be  independent  if  for  all  x  and  y,  the 
events  X  —  x  and  F  =  y  are  independent  or,  equivalently,  if  for  all  x  and  y,  we 
have  Pr{Y  =  x  and  F  =  y}  =  Pr  {X  =  x}Pr{F  =  y}. 

Given  a  set  of  random  variables  defined  over  the  same  sample  space,  we  can 
define  new  random  variables  as  sums,  products,  or  other  functions  of  the  original 
variables. 


Expected  value  of  a  random  variable 

The  simplest  and  most  useful  summary  of  the  distribution  of  a  random  variable  is 
the  “average”  of  the  values  it  takes  on.  The  expected  value  (or,  synonymously, 
expectation  or  mean)  of  a  discrete  random  variable  X  is 

E  [X]  =  ^  x  ■  Pr  { Y  =  x}  ,  (C.20) 

which  is  well  defined  if  the  sum  is  finite  or  converges  absolutely.  Sometimes  the 
expectation  of  X  is  denoted  by  nx  or,  when  the  random  variable  is  apparent  from 
context,  simply  by  /i. 

Consider  a  game  in  which  you  flip  two  fair  coins.  You  earn  $3  for  each  head  but 
lose  $2  for  each  tail.  The  expected  value  of  the  random  variable  X  representing 
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your  earnings  is 

E[X]  =  6-Pr{2H’s}+  1  -Pr{l  H,  1  t}  -  4  ■  Pr  {2  t’s} 

=  6(1/4) +  1(1/2) -4(1/4) 

=  1  . 

The  expectation  of  the  sum  of  two  random  variables  is  the  sum  of  their  expecta¬ 
tions,  that  is, 

E[X  +  Y]  =  E[X]  +  E[Y]  ,  (C.21) 

whenever  E  [X }  and  E  [K]  are  defined.  We  call  this  property  linearity  of  expecta¬ 
tion,  and  it  holds  even  if  X  and  Y  are  not  independent.  It  also  extends  to  finite  and 
absolutely  convergent  summations  of  expectations.  Linearity  of  expectation  is  the 
key  property  that  enables  us  to  perform  probabilistic  analyses  by  using  indicator 
random  variables  (see  Section  5.2). 

If  X  is  any  random  variable,  any  function  g(x)  defines  a  new  random  vari¬ 
able  g(2f).  If  the  expectation  of  g(X )  is  defined,  then 

E[g(20]  =  J>(x).Pr{*  =  x}  . 


Letting  g(x)  =  ax,  we  have  for  any  constant  a, 

E  [aX]  =  aE  [X]  .  (C.22) 

Consequently,  expectations  are  linear:  for  any  two  random  variables  X  and  Y  and 
any  constant  a, 

E[aX +  Y}=  aE[X]  +  E[Y]  .  (C.23) 


When  two  random  variables  X  and  Y  are  independent  and  each  has  a  defined 
expectation, 

E IX  Y]  =  xy  •  Pr  {X  —  x  and  Y  —  y} 

x  y 

=  ^J>y.Pr{X  =  x}Pr{F  =  y} 

x  y 


=  E[X]E[Y]  . 


In  general,  when  n  random  variables  X | ,  X2, ....  Xn  are  mutually  independent, 

E  [XiX2  ■  ■  ■  Xn]  =  E  [Xx]  E  [X2]  •  •  •  E  [Xn\  .  (C.24) 
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When  a  random  variable  A  takes  on  values  from  the  set  of  natural  numbers 
N  =  {0, 1,2,...},  we  have  a  nice  formula  for  its  expectation: 

OO 

E[X]  =  £>Pr{A  =  i} 

i =0 
oo 

=  i  (Pr  {X  >  z }  —  Pr  {A  >  i  +  1 }) 

i =0 
oo 

=  £Pr{A>i},  (C.25) 

(=i 

since  each  term  Pr{A  >  i  }  is  added  in  i  times  and  subtracted  out  i  —  1  times 
(except  Pr{A  >  0},  which  is  added  in  0  times  and  not  subtracted  out  at  all). 

When  we  apply  a  convex  function  f(x)  to  a  random  variable  X,  Jensen’s  in¬ 
equality  gives  us 

E[/(A)]>/(E[A]),  (C.26) 

provided  that  the  expectations  exist  and  are  finite.  (A  function  fix)  is  convex 
if  for  all  x  and  y  and  for  all  0  <  A  <  1,  we  have  /( Xx  +  (1  —  A)y)  < 
Xf{x)  +  (\-\)f{y).) 

Variance  and  standard  deviation 

The  expected  value  of  a  random  variable  does  not  tell  us  how  “spread  out”  the 
variable’s  values  are.  For  example,  if  we  have  random  variables  X  and  Y  for  which 
Pr  {2f  =  1/4}  =  Pr  {A  =  3/4}  =  1/2  and  Pr{7  =  0}  =  Pr{F  =  1}  =  1/2, 
then  both  E  [X]  and  E  [y]  are  1/2,  yet  the  actual  values  taken  on  by  Y  are  farther 
from  the  mean  than  the  actual  values  taken  on  by  A. 

The  notion  of  variance  mathematically  expresses  how  far  from  the  mean  a  ran¬ 
dom  variable’s  values  are  likely  to  be.  The  variance  of  a  random  variable  A  with 
mean  E  [A]  is 

Var  [A]  =  E[(A-E[A])2] 

=  E  [A2  —  2AE  [A]  +  E2  [A]] 

=  E  [A2]  -  2E  [AE  [A]]  +  E2  [A] 

=  E  [A2]  -  2E2  [A]  +  E2  [A] 

=  E  [A2]  -  E2  [A]  .  (C.27) 

To  justify  the  equality  E  [E2  [A]]  =  E2  [A],  note  that  because  E  [A]  is  a  real  num¬ 
ber  and  not  a  random  variable,  so  is  E2  [A],  The  equality  E  [AE  [A]]  =  E2  [A] 
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follows  from  equation  (C.22),  with  a  =  E  [X].  Rewriting  equation  (C.27)  yields 
an  expression  for  the  expectation  of  the  square  of  a  random  variable: 

E  [X2]  =  Var  [X]  +  E2  [X]  .  (C.28) 

The  variance  of  a  random  variable  X  and  the  variance  of  aX  are  related  (see 
Exercise  C.3-10): 

Var  [aX]  =  a2Var[V]  . 

When  X  and  Y  are  independent  random  variables, 

Var  [X  +  Y]  =  Var  [V]  +  Var  [Y]  . 


In  general,  if  n  random  variables  X\,  X2, . . . ,  Xn  are  pairwise  independent,  then 


Var 


E* 


i  =  i 


(C.29) 


The  standard  deviation  of  a  random  variable  X  is  the  nonnegative  square  root 
of  the  variance  of  X.  The  standard  deviation  of  a  random  variable  X  is  sometimes 
denoted  Ox  or  simply  er  when  the  random  variable  X  is  understood  from  context. 
With  this  notation,  the  variance  of  X  is  denoted  a2. 


Exercises 


C.3-1 

Suppose  we  roll  two  ordinary,  6-sided  dice.  What  is  the  expectation  of  the  sum 
of  the  two  values  showing?  What  is  the  expectation  of  the  maximum  of  the  two 
values  showing? 


C.3-2 

An  array  A[  1  . .  n\  contains  n  distinct  numbers  that  are  randomly  ordered,  with  each 
permutation  of  the  n  numbers  being  equally  likely.  What  is  the  expectation  of  the 
index  of  the  maximum  element  in  the  array?  What  is  the  expectation  of  the  index 
of  the  minimum  element  in  the  array? 


C.3-3 

A  carnival  game  consists  of  three  dice  in  a  cage.  A  player  can  bet  a  dollar  on  any 
of  the  numbers  1  through  6.  The  cage  is  shaken,  and  the  payoff  is  as  follows.  If  the 
player’s  number  doesn’t  appear  on  any  of  the  dice,  he  loses  his  dollar.  Otherwise, 
if  his  number  appeal's  on  exactly  k  of  the  three  dice,  for  k  =  1,2,  3,  he  keeps  his 
dollar  and  wins  k  more  dollars.  What  is  his  expected  gain  from  playing  the  carnival 
game  once? 
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C.3-4 

Argue  that  if  A  and  Y  are  nonnegative  random  variables,  then 
E  [max(A,  7)]  <  E  [A]  +  E  [7]  . 

C.3-5  * 

Let  X  and  7  be  independent  random  variables.  Prove  that  /(A)  and  g(Y )  are 
independent  for  any  choice  of  functions  /  and  g. 

C.3-6  * 

Let  A  be  a  nonnegative  random  variable,  and  suppose  that  E  [A]  is  well  defined. 
Prove  Markov’s  inequality. 

Pr{A  >  t)  <  E[A]/f  (C.30) 

for  all  t  >  0. 

C.3-7  * 

Let  S  be  a  sample  space,  and  let  A  and  X'  be  random  variables  such  that 
X(s)  >  X \s)  for  all  5  e  S.  Prove  that  for  any  real  constant  t, 

Pr  {A  >  /}  >  Pr{A'  >  t)  . 


C.3-8 

Which  is  larger:  the  expectation  of  the  square  of  a  random  variable,  or  the  square 
of  its  expectation? 


C.3-9 

Show  that  for  any  random  variable  A  that  takes  on  only  the  values  0  and  1 ,  we  have 
Var  [A]  =  E[A]E[1  -  A], 

C.3-10 

Prove  that  Var  [a  A]  =  a 2  Var  [A]  from  the  definition  (C.27)  of  variance. 


C.4  The  geometric  and  binomial  distributions 

We  can  think  of  a  coin  flip  as  an  instance  of  a  Bernoulli  trial ,  which  is  an  experi¬ 
ment  with  only  two  possible  outcomes:  success,  which  occurs  with  probability  p, 
and  failure,  which  occurs  with  probability  q  =  1  —  p.  When  we  speak  of  Bernoulli 
trials  collectively,  we  mean  that  the  trials  are  mutually  independent  and,  unless  we 
specifically  say  otherwise,  that  each  has  the  same  probability  p  for  success.  Two 
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Figure  C.l  A  geometric  distribution  with  probability  p  =  1/3  of  success  and  a  probability 
q  =  1  —  p  of  failure.  The  expectation  of  the  distribution  is  l/p  =  3. 

important  distributions  arise  from  Bernoulli  trials:  the  geometric  distribution  and 
the  binomial  distribution. 

The  geometric  distribution 

Suppose  we  have  a  sequence  of  Bernoulli  trials,  each  with  a  probability  p  of  suc¬ 
cess  and  a  probability  q  =  1  —  p  of  failure.  How  many  trials  occur  before  we  obtain 
a  success?  Let  us  define  the  random  variable  X  be  the  number  of  trials  needed  to 
obtain  a  success.  Then  X  has  values  in  the  range  {1,2,...},  and  for  k  >  1, 


Pr{X  =  k}  =  qk~lp  , 


(C.31) 


since  we  have  k  —  1  failures  before  the  one  success.  A  probability  distribution  sat¬ 
isfying  equation  (C.31)  is  said  to  be  a  geometric  distribution.  Figure  C.l  illustrates 
such  a  distribution. 
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Assuming  that  q  <  1,  we  can  calculate  the  expectation  of  a  geometric  distribu¬ 
tion  using  identity  (A.  8): 


E[A]  =  Y,k(?~lP 

k=  1 


oo 


p  cl 
q  (1  -q)2 

p  _ 

q  p 2 


=  i Ip  .  (C.32) 

Thus,  on  average,  it  takes  1  / p  trials  before  we  obtain  a  success,  an  intuitive  result. 
The  variance,  which  can  be  calculated  similarly,  but  using  Exercise  A.  1-3,  is 

Var[X]  =  q/p1  .  (C.33) 


As  an  example,  suppose  we  repeatedly  roll  two  dice  until  we  obtain  either  a 
seven  or  an  eleven.  Of  the  36  possible  outcomes,  6  yield  a  seven  and  2  yield  an 
eleven.  Thus,  the  probability  of  success  is  p  =  8/36  =  2/9,  and  we  must  roll 
1  Ip  =  9/2  =  4.5  times  on  average  to  obtain  a  seven  or  eleven. 


The  binomial  distribution 


How  many  successes  occur  during  n  Bernoulli  trials,  where  a  success  occurs  with 
probability  p  and  a  failure  with  probability  q  =  I  —  pi  Define  the  random  vari¬ 
able  X  to  be  the  number  of  successes  in  n  trials.  Then  X  has  values  in  the  range 
{0, and  for  k  =  0, 1, . . . ,  n, 


Pr{X  =  k } 


(C.34) 


since  there  are  (/)  ways  to  pick  which  k  of  the  n  trials  are  successes,  and  the 
probability  that  each  occurs  is  pkqn~k .  A  probability  distribution  satisfying  equa¬ 
tion  (C.34)  is  said  to  be  a  binomial  distribution.  For  convenience,  we  define  the 
family  of  binomial  distributions  using  the  notation 


b(k\n.p) 


(C.35) 


Figure  C.2  illustrates  a  binomial  distribution.  The  name  “binomial”  comes  from  the 
right-hand  side  of  equation  (C.34)  being  the  kth  term  of  the  expansion  of  (p  +  q)n . 
Consequently,  since  p  +  q  =  1, 
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b  (k\  15,  1/3) 


Figure  C.2  The  binomial  distribution  b(k\  15,  1/3)  resulting  from  n  =  15  Bernoulli  trials,  each 
with  probability  p  =  1/3  of  success.  The  expectation  of  the  distribution  is  np  =  5. 


n 

Yb(k\n,p)  =  1  ,  (C.36) 

k= 0 

as  axiom  2  of  the  probability  axioms  requires. 

We  can  compute  the  expectation  of  a  random  variable  having  a  binomial  distri¬ 
bution  from  equations  (C.8)  and  (C.36).  Let  X  be  a  random  variable  that  follows 
the  binomial  distribution  b(k\ n ,  p),  and  let  q  =  1  —  p.  By  the  definition  of  expec¬ 
tation,  we  have 

n 

E[X]  =  J]/t-Pr{X=yt} 

k= 0 
n 

=  k  ■  b(k',n,  p) 

k= 0 

=  E 

=  »/>£( 
k= 0  \ 


”  l\pkq^-k 


(by  equation  (C.8)) 
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n— 1 

=  np  Y,  b{k\ n  —  l,p ) 

k= 0 

=  np  (by  equation  (C.36))  .  (C.37) 

By  using  the  linearity  of  expectation,  we  can  obtain  the  same  result  with  sub¬ 
stantially  less  algebra.  Let  X,-  be  the  random  variable  describing  the  number  of 
successes  in  the  z'th  trial.  Then  E  [X,-]  =  p  ■  l  +  q  ■  0  =  p,  and  by  linearity  of 
expectation  (equation  (C.21)),  the  expected  number  of  successes  for  n  trials  is 


E  [.V] 


X* 

./'  =  1 


Eem 

i  =  1 
n 

(=1 
np  . 


(C.38) 


We  can  use  the  same  approach  to  calculate  the  variance  of  the  distribution.  Using 
equation  (C.27),  we  have  Var  [X,]  =  E  [Xf]  —  E2  [X,].  Since  X,  only  takes  on  the 
values  0  and  1,  we  have  Xf  =  X,,  which  implies  E  [X2]  =  E  [X,]  =  p.  Hence, 

Var  [X,]  =  p-p2  =  p(l-p)  =  pq.  (C.39) 

To  compute  the  variance  of  X,  we  take  advantage  of  the  independence  of  the  n 
trials;  thus,  by  equation  (C.29), 


Var  [X] 


Var 


£*< 


X>ar[X,] 

i  =  1 
n 

Em 

i  =  1 
npq  . 


(C.40) 


As  Figure  C.2  shows,  the  binomial  distribution  b(k\n,  p)  increases  with  k  until 
it  reaches  the  mean  np,  and  then  it  decreases.  We  can  prove  that  the  distribution 
always  behaves  in  this  manner  by  looking  at  the  ratio  of  successive  terms: 
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b{k\ n,  p) 


b(k  —  1  ;n,  p) 


{kl1)pk-1qn-k+1 


n\(k  —  !)!(«  —  k  +  \)\p 


k\(n  —  k)\n\q 
0 n-k  +  1  )p 


(C.41) 


kq 

!  («  +  l)/>  -fc 


A;  ^ 


This  ratio  is  greater  than  1  precisely  when  (n  +  l)/>  —  k  is  positive.  Conse¬ 
quently,  b(k',n,p)  >  b(k  —  \\n,p )  for/:  <  («  +  1 )/;  (the  distribution  increases), 
and  b(k;n,p)  <  b(k  —  \\n .  p)  for  k  >  (n  +  \)p  (the  distribution  decreases). 
If  k  =  (n  +  l)p  is  an  integer,  then  b(k;n,  p)  =  /)(/:  —  1;  n,  /;),  and  so  the  distri¬ 
bution  then  has  two  maxima:  at  k  =  ( n+l)p  and  at/:—  1  =  (n+l)p— 1  =  np  —  q. 
Otherwise,  it  attains  a  maximum  at  the  unique  integer  k  that  lies  in  the  range 
np  —  q  <  k  <  (n  +  \)p. 

The  following  lemma  provides  an  upper  bound  on  the  binomial  distribution. 

Lemma  C.l 

Let  77  >  0,  let  0  <  p  <  1,  let  q  =  1  —  p,  and  let  0  <  k  <  n.  Then 


Proof  Using  equation  (C.6),  we  have 


Exercises 


C.4-1 


Verify  axiom  2  of  the  probability  axioms  for  the  geometric  distribution. 


C.4-2 

How  many  times  on  average  must  we  flip  6  fair  coins  before  we  obtain  3  heads 
and  3  tails? 


C.4  The  geometric  and  binomial  distributions 
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C.4-3 

Show  that  b(k\n ,  p)  =  b(n  —  k\ n,q),  where  q  =  1  —  p. 


C.4-4 

Show  that  value  of  the  maximum  of  the  binomial  distribution  b(k',n,  p)  is  approx¬ 
imately  1/ s/Zjrnpq,  where  q  =  1  —  p. 

C.4-5  * 

Show  that  the  probability  of  no  successes  in  n  Bernoulli  trials,  each  with  probability 
p  =  I/77,  is  approximately  \/e.  Show  that  the  probability  of  exactly  one  success 
is  also  approximately  \/e. 

C.4-6  * 

Professor  Rosencrantz  flips  a  fair  coin  n  times,  and  so  does  Professor  Guildenstern. 
Show  that  the  probability  that  they  get  the  same  number  of  heads  is  (2„")/4".  (Hint: 
For  Professor  Rosencrantz,  call  a  head  a  success;  for  Professor  Guildenstern,  call 
a  tail  a  success.)  Use  your  argument  to  verify  the  identity 


C.4-7  * 

Show  that  for  0  <  k  <  n, 

b(k;n,  1/2)  <  2nHik/n)~n  , 

where  H(x)  is  the  entropy  function  (C.7). 

C.4-8  * 

Consider  n  Bernoulli  trials,  where  for  i  =  1, 2, . . . ,  n,  the  /th  trial  has  probabil¬ 
ity  pi  of  success,  and  let  X  be  the  random  variable  denoting  the  total  number  of 
successes.  Let  p  >  p,  for  all  i  =  1,2, ...  ,n.  Prove  that  for  1  <  k  <  n, 

k—  1 

Pr{X  <  k}  >  Yb{i\n,p)  ■ 

i=  0 


C.4-9  * 

Let  X  be  the  random  variable  for  the  total  number  of  successes  in  a  set  A  of  n 
Bernoulli  trials,  where  the  / th  trial  has  a  probability  p,  of  success,  and  let  X' 
be  the  random  variable  for  the  total  number  of  successes  in  a  second  set  A!  of  n 
Bernoulli  trials,  where  the  /  th  trial  has  a  probability  p\  >  p,  of  success.  Prove  that 
for  0  <  k  <  n , 
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Pr{W  >k}>  Pr{X  >  k}  . 

{Hint:  Show  how  to  obtain  the  Bernoulli  trials  in  A!  by  an  experiment  involving 
the  trials  of  A,  and  use  the  result  of  Exercise  C.3-7.) 


★  C.5  The  tails  of  the  binomial  distribution 


The  probability  of  having  at  least,  or  at  most,  k  successes  in  n  Bernoulli  trials, 
each  with  probability  p  of  success,  is  often  of  more  interest  than  the  probability  of 
having  exactly  k  successes.  In  this  section,  we  investigate  the  tails  of  the  binomial 
distribution:  the  two  regions  of  the  distribution  b(k',n,p)  that  are  far  from  the 
mean  np.  We  shall  prove  several  important  bounds  on  (the  sum  of  all  terms  in)  a 
tail. 

We  first  provide  a  bound  on  the  right  tail  of  the  distribution  b(k',n,  p).  We  can 
determine  bounds  on  the  left  tail  by  inverting  the  roles  of  successes  and  failures. 

Theorem  C.2 

Consider  a  sequence  of  n  Bernoulli  trials,  where  success  occurs  with  probability  p. 
Let  X  be  the  random  variable  denoting  the  total  number  of  successes.  Then  for 
0  <  k  <  n ,  the  probability  of  at  least  k  successes  is 


n 


Pr{X>/c}  =  Yb(j\n,p) 


i=k 


Proof  For  S  C  {1,2,... ,  n } ,  we  let  As  denote  the  event  that  the  / th  trial  is  a 
success  for  every  i  e  S.  Clearly  Pr  {^4^}  =  pk  if  |S|  =  k.  We  have 

Pr{X>/:}  =  Pr  {there  exists  S  c  {1, 2, . . . ,  n)  :  |S|  =  k  and  ^4$} 


Pr  |  U  AA 


(sc{l,2,...,n}:|S|=fc  ( 


< 


Sc{l,2,...,«}:|S|=fc 
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The  following  corollary  restates  the  theorem  for  the  left  tail  of  the  binomial 
distribution.  In  general,  we  shall  leave  it  to  you  to  adapt  the  proofs  from  one  tail  to 
the  other. 

Corollary  C.3 

Consider  a  sequence  of  n  Bernoulli  trials,  where  success  occurs  with  probabil¬ 
ity  p.  If  X  is  the  random  variable  denoting  the  total  number  of  successes,  then  for 
0  <  k  <  n ,  the  probability  of  at  most  k  successes  is 
k 

Pr {X  <  k}  =  Yb(i-,n,p) 

i —0 

- 

Our  next  bound  concerns  the  left  tail  of  the  binomial  distribution.  Its  corollary 
shows  that,  far  from  the  mean,  the  left  tail  diminishes  exponentially. 

Theorem  C.4 

Consider  a  sequence  of  n  Bernoulli  trials,  where  success  occurs  with  probability  p 
and  failure  with  probability  q  —  1  —  p.  Let  X  be  the  random  variable  denoting  the 
total  number  of  successes.  Then  for  0  <  k  <  np,  the  probability  of  fewer  than  k 
successes  is 

k—  1 

Pr{X</:}  =  Y,  b  (i  '.n ,  p) 

i =0 

<  — — —  b(k',n,  p)  . 
np  —  k 

Proof  We  bound  the  series  Yi= o  ;  n ,  p)  by  a  geometric  series  using  the  tech¬ 
nique  from  Section  A.2,  page  1151.  For  i  =  1, 2,  ...,k,  we  have  from  equa¬ 
tion  (C.41), 

b{i  —  \\n,p )  iq 

b(i;n,  p)  (n  —  i  +  \)p 


(n  -  i)p 
kq 

(n  —  k)p  ' 


< 
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If  we  let 

kq 

x  =  - 

(■ n  —  k)p 

kq 

(n  —  np)p 
kq 
nqp 
k 

np 

<  1  , 

it  follows  that 

b(i  —  \;n ,  p)  <  x  b(i\n ,  p) 

for  0  <  i  <  k.  Iteratively  applying  this  inequality  k  —  i  times,  we  obtain 
b(i;n,  p)  <  xk~‘  b(k;n,  p) 
for  0  <  i  <  k,  and  hence 

k—l 

^2b(i;n,p)  < 

i  =  0 


k— 1 


^ xk  lb(k;n ,  p ) 

i=0 

oo 

b{k\ n,  p)  Y:  xl 


i—0 


1  —  X 
kq 


np  —  k 


b(k;  n,p) 
b(k;n,p )  . 


Corollary  C.5 

Consider  a  sequence  of  n  Bernoulli  trials,  where  success  occurs  with  probability  p 
and  failure  with  probability  q  =  1  —  p.  Then  for  0  <  k  <  np/ 2,  the  probability  of 
fewer  than  k  successes  is  less  than  one  half  of  the  probability  of  fewer  than  k  +  1 
successes. 

Proof  Because  k  <  np /  2,  we  have 
kq  (np/2)q 


np  —  k 


np  —  (np/  2) 
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(np/2)q 

np/2 

<  1  ,  (C.42) 


since  q  <  1.  Letting  X  be  the  random  variable  denoting  the  number  of  successes, 
Theorem  C.4  and  inequality  (C.42)  imply  that  the  probability  of  fewer  than  k  suc¬ 
cesses  is 


k—  1 

Pr{X  <  k)  =  Yb(i-,n,p)  <  b(k;n,  p)  . 

i= 0 


Thus  we  have 

Pr{X  <  k}  =  E*=o  b(i;n,p) 

Pr{X<k  +  l}  Ef=o^(cn>P) 

Ef:d  b{i\n,p) 

Ef=o  b(i;n,p)  +  b(k;n,p ) 

<  1/2, 

since  EEo  b(i',n,p)  <  b(k;n,p).  m 


Bounds  on  the  right  tail  follow  similarly.  Exercise  C.5-2  asks  you  to  prove  them. 


Corollary  C.6 

Consider  a  sequence  of  n  Bernoulli  trials,  where  success  occurs  with  probability  p. 
Let  X  be  the  random  variable  denoting  the  total  number  of  successes.  Then  for 
np  <  k  <  n ,  the  probability  of  more  than  k  successes  is 


Pr  {X>k}  = 


< 


n 

Yj  b(kn,p) 

i=k+ 1 


(, n  —  k)p 
k  —  np 


b(k;n,  p)  . 


Corollary  C.7 

Consider  a  sequence  of  n  Bernoulli  trials,  where  success  occurs  with  probability  p 
and  failure  with  probability  q  =  1  —  p.  Then  for  (np  +  n)/2  <  k  <  n,  the 
probability  of  more  than  k  successes  is  less  than  one  half  of  the  probability  of 
more  than  k  —  1  successes.  ■ 


The  next  theorem  considers  n  Bernoulli  trials,  each  with  a  probability  /;,  of 
success,  for  i  —  1,2, ...  ,n.  As  the  subsequent  corollary  shows,  we  can  use  the 
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theorem  to  provide  a  bound  on  the  right  tail  of  the  binomial  distribution  by  setting 
p,  =  p  for  each  trial. 

Theorem  C.8 

Consider  a  sequence  of  n  Bernoulli  trials,  where  in  the  z  th  trial,  for  i  =  1,2,...,/?, 
success  occurs  with  probability  pt  and  failure  occurs  with  probability  q,  =  \  —  p, . 
Let  X  be  the  random  variable  describing  the  total  number  of  successes,  and  let 
pi  =  E  [X],  Then  for  r  >  pi, 

Pr {X  -pt>  r}  <  . 


Proof  Since  for  any  a  >  0,  the  function  eax  is  strictly  increasing  in  x, 

Pr{X  -  pi  >  r}  =  Pr  {e“(x_/i)  >  ear }  ,  (C.43) 

where  we  will  determine  a  later.  Using  Markov’s  inequality  (C.30),  we  obtain 
Pr  {ea(x->l)  >  earJ  <  E  [< ? e-ar  .  (C.44) 

The  bulk  of  the  proof  consists  of  bounding  E  and  substituting  a  suit¬ 
able  value  for  a  in  inequality  (C.44).  First,  we  evaluate  E  Using  the 

technique  of  indicator  random  variables  (see  Section  5.2),  let  X,  =  I  {the  i  th 
Bernoulli  trial  is  a  success}  for  /'  =  1,2, .. . ,/?;  that  is,  Xt  is  the  random  vari¬ 
able  that  is  1  if  the  i  th  Bernoulli  trial  is  a  success  and  0  if  it  is  a  failure.  Thus, 


i=i 

and  by  linearity  of  expectation, 


pt  =  E[X]  =  E 


_i  =  l 


which  implies 


^E[X,-]  =  Y^Pi  ’ 

/  =  1  i  =  1 


X-PL  =  J2(Xi  -  Pi)  • 

i  =  1 


To  evaluate  E  fea(x  ,x)],  we  substitute  for  X  —  pi,  obtaining 


E  [ea(x~^]  =  E  [ 

=  E 


(Xi-Pi) 


] 


n 

./ =  1 


a(Xi-pi) 


=  Y\E[ea(x^>]  , 

i  =  1 
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which  follows  from  (C.24),  since  the  mutual  independence  of  the  random  vari¬ 
ables  Xi  implies  the  mutual  independence  of  the  random  variables  e“(Xi_Pf)  (see 
Exercise  C.3-5).  By  the  definition  of  expectation, 


E[b° !(X,-Pi)]  _ 


< 


e*(l-Pi)p.  +  g^O-pi) 

Pieaqi  +qie-api 
Piea  +  1 


<  exp  (pte01)  , 


(C.45) 


where  exp(x)  denotes  the  exponential  function:  exp(x)  =  ex.  (Inequality  (C.45) 
follows  from  the  inequalities  a  >  0,  qt  <  1,  eaq'  <  ea,  and  e~api  <  1,  and  the  last 
line  follows  from  inequality  (3.12).)  Consequently, 


E  [eaix-,l)]  =  Y\  E[eaiX‘-pi)] 

i  =  1 


n 

<  Y\exp{pjea) 

i=  1 


=  exp  (fie01)  , 


(C.46) 


since  /i  =  YH=i  Pi-  Therefore,  from  equation  (C.43)  and  inequalities  (C.44) 
and  (C.46),  it  follows  that 


Pr  {X  -  ii  >  r}  <  exp  (pie01  -  ar)  .  (C.47) 

Choosing  a  —  In (r/ fi)  (see  Exercise  C.5-7),  we  obtain 

Pr  {X  -ix  >r}  <  exp(/xeln(r/M)  -  r  In (r/fi)) 

=  exp (r  —  r  \n(r / ft)) 
er 

(r/fi)r 

=  ffl  • 


When  applied  to  Bernoulli  trials  in  which  each  trial  has  the  same  probability  of 
success,  Theorem  C.8  yields  the  following  corollary  bounding  the  right  tail  of  a 
binomial  distribution. 
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Corollary  C.9 

Consider  a  sequence  of  n  Bernoulli  trials,  where  in  each  trial  success  occurs  with 
probability  p  and  failure  occurs  with  probability  q  =  1  —p.  Then  for  r  >  np, 

n 

Pr  {X  —  np  >  r}  =  ^  b(k;n,p ) 

k=[np+rl 


Proof  By  equation  (C.37),  we  have  p  =  E  [X]  =  np.  m 

Exercises 
C.5-1  * 

Which  is  less  likely:  obtaining  no  heads  when  you  flip  a  fair  coin  n  times,  or 
obtaining  fewer  than  n  heads  when  you  flip  the  coin  4 n  times? 

C.5-2  * 

Prove  Corollaries  C.6  and  C.7. 


C.5-3  * 

Show  that 

Y  <  (a  +  l)n 


i= 0 


na  —  k{a  +  1) 


b(k;n,a/(a  +  1)) 


for  all  a  >  0  and  all  k  such  that  0  <  k  <  na/{a  +  1). 

C.5-4  * 

Prove  that  if  0  <  k  <  np,  where  0  <  p  <  1  and  q  =  1  —  p,  then 


z  —  0 


C.5-5  * 

Show  that  the  conditions  of  Theorem  C.8  imply  that 

,  „  .  (n  —  a)e' 

Pr  {p~X>r}<‘y  w 


^(n  -  p)e^ 


Similarly,  show  that  the  conditions  of  Corollary  C.9  imply  that 
/ nqey 


Problems  for  Appendix  C 
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C.5-6  * 

Consider  a  sequence  of  n  Bernoulli  trials,  where  in  the  i  th  trial,  for  i  —  1,2 ,...,«, 
success  occurs  with  probability  p,  and  failure  occurs  with  probability  cp  —  1  —  /;, . 
Let  X  be  the  random  variable  describing  the  total  number  of  successes,  and  let 
/x  =  E  [X],  Show  that  for  r  >  0, 

Pr  {X  -  pi  >r}<  e~^/2n  . 

(Hint:  Prove  that  pieaqi  +  qje~api  <  ea~!2.  Then  follow  the  outline  of  the  proof 
of  Theorem  C.8,  using  this  inequality  in  place  of  inequality  (C.45).) 

C.5-7  * 

Show  that  choosing  a  =  In (r//z)  minimizes  the  right-hand  side  of  inequal¬ 
ity  (C.47). 


Problems 


C-l  Balls  and  bins 

In  this  problem,  we  investigate  the  effect  of  various  assumptions  on  the  number  of 

ways  of  placing  n  balls  into  b  distinct  bins. 

a.  Suppose  that  the  n  balls  are  distinct  and  that  their  order  within  a  bin  does  not 
matter.  Argue  that  the  number  of  ways  of  placing  the  balls  in  the  bins  is  bn. 

b.  Suppose  that  the  balls  are  distinct  and  that  the  balls  in  each  bin  are  ordered. 
Prove  that  there  are  exactly  (b  +  n  —  l)\/(b  —  1)!  ways  to  place  the  balls  in  the 
bins.  (Hint:  Consider  the  number  of  ways  of  arranging  n  distinct  balls  and  b—  1 
indistinguishable  sticks  in  a  row.) 

c.  Suppose  that  the  balls  are  identical,  and  hence  their  order  within  a  bin  does  not 
matter.  Show  that  the  number  of  ways  of  placing  the  balls  in  the  bins  is  (fe+"_1)- 
(Hint:  Of  the  arrangements  in  part  (b),  how  many  are  repeated  if  the  balls  are 
made  identical?) 

d.  Suppose  that  the  balls  are  identical  and  that  no  bin  may  contain  more  than  one 
ball,  so  that  n  <  b.  Show  that  the  number  of  ways  of  placing  the  balls  is  (*). 

e.  Suppose  that  the  balls  are  identical  and  that  no  bin  may  be  left  empty.  Assuming 
that  n  >  b,  show  that  the  number  of  ways  of  placing  the  balls  is 
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Appendix  notes 

The  first  general  methods  for  solving  probability  problems  were  discussed  in  a 
famous  correspondence  between  B.  Pascal  and  P.  de  Fermat,  which  began  in  1654, 
and  in  a  book  by  C.  Huygens  in  1657.  Rigorous  probability  theory  began  with  the 
work  of  J.  Bernoulli  in  1713  and  A.  De  Moivre  in  1730.  Further  developments  of 
the  theory  were  provided  by  P.-S.  Laplace,  S.-D.  Poisson,  and  C.  F.  Gauss. 

Sums  of  random  variables  were  originally  studied  by  P.  L.  Chebyshev  and  A.  A. 
Markov.  A.  N.  Kolmogorov  axiomatized  probability  theory  in  1933.  Chernoff  [66] 
and  Hoeffding  [173]  provided  bounds  on  the  tails  of  distributions.  Seminal  work 
in  random  combinatorial  structures  was  done  by  P.  Erdos. 

Knuth  [209]  and  Liu  [237]  are  good  references  for  elementary  combinatorics 
and  counting.  Standard  textbooks  such  as  Billingsley  [46],  Chung  [67],  Drake  [95], 
Feller  [104],  and  Rozanov  [300]  offer  comprehensive  introductions  to  probability. 


D 


Matrices 


Matrices  arise  in  numerous  applications,  including,  but  by  no  means  limited  to, 
scientific  computing.  If  you  have  seen  matrices  before,  much  of  the  material  in  this 
appendix  will  be  familial-  to  you,  but  some  of  it  might  be  new.  Section  D.l  covers 
basic  matrix  definitions  and  operations,  and  Section  D.2  presents  some  basic  matrix 
properties. 


D.l  Matrices  and  matrix  operations 

In  this  section,  we  review  some  basic  concepts  of  matrix  theory  and  some  funda¬ 
mental  properties  of  matrices. 


Matrices  and  vectors 


A  matrix  is  a  rectangular  array  of  numbers.  For  example, 
A  _  fa  11  a  12  a  u 

\  <321  a22  <323 


(D.l) 


is  a  2  x  3  matrix  A  =  (ay),  where  for  /  =  1,2  and  j  =  1,2,3,  we  denote  the 
element  of  the  matrix  in  row  i  and  column  j  by  ay.  We  use  uppercase  letters 
to  denote  matrices  and  corresponding  subscripted  lowercase  letters  to  denote  their 
elements.  We  denote  the  set  of  all  m  x  n  matrices  with  real-valued  entries  by  Mmx" 
and,  in  general,  the  set  of  m  x  n  matrices  with  entries  drawn  from  a  set  S  by  .S’"'x". 

The  transpose  of  a  matrix  A  is  the  matrix  AT  obtained  by  exchanging  the  rows 
and  columns  of  A.  For  the  matrix  A  of  equation  (D.l), 
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A1  = 


A  vector  is  a  one-dimensional  array  of  numbers.  For  example, 

"0 

is  a  vector  of  size  3.  We  sometimes  call  a  vector  of  length  n  an  n-vector.  We 
use  lowercase  letters  to  denote  vectors,  and  we  denote  the  i  th  element  of  a  size-/* 
vector  x  by  x,,  for  i  =  1,2 We  take  the  standard  form  of  a  vector  to  be 
as  a  column  vector  equivalent  to  an  n  x  1  matrix;  the  corresponding  row  vector  is 
obtained  by  taking  the  transpose: 

xT  =  (  2  3  5  )  . 

The  unit  vector  et  is  the  vector  whose  /th  element  is  1  and  all  of  whose  other 
elements  are  0.  Usually,  the  size  of  a  unit  vector  is  clear  from  the  context. 

A  zero  matrix  is  a  matrix  all  of  whose  entries  are  0.  Such  a  matrix  is  often 
denoted  0,  since  the  ambiguity  between  the  number  0  and  a  matrix  of  Os  is  usually 
easily  resolved  from  context.  If  a  matrix  of  Os  is  intended,  then  the  size  of  the 
matrix  also  needs  to  be  derived  from  the  context. 


Square  matrices 

Square  n  x  n  matrices  arise  frequently.  Several  special  cases  of  square  matrices 
are  of  particular-  interest: 

1.  A  diagonal  matrix  has  =  0  whenever  i  ^  j .  Because  all  of  the  off-diagonal 
elements  are  zero,  we  can  specify  the  matrix  by  listing  the  elements  along  the 
diagonal: 


£Zn  0 
0  <322 


0  \ 
0 


diag(£Zn,  a22 . ann)  = 

0  0  ...  ann) 

2.  The  n  x  n  identity  matrix  I „  is  a  diagonal  matrix  with  Is  along  the  diagonal: 
In  =  diag(l,  1, ....  1) 

1  0  ...  0\ 

0  1  ...  0 


0  0 


V 
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When  I  appears  without  a  subscript,  we  derive  its  size  from  the  context.  The  i  th 
column  of  an  identity  matrix  is  the  unit  vector  e, . 

3.  A  tridiagonal  matrix  T  is  one  for  which  =  0  if  \i  —  j  \  >  1.  Nonzero  entries 
appear  only  on  the  main  diagonal,  immediately  above  the  main  diagonal 
for  i  =  1,2, ... , «  —  1),  or  immediately  below  the  main  diagonal  (h+i,,  for 
i  =  1,2 _ ,n  —  1): 


/  hi  hi  0  0 

tn  ?22  hi  0 
0  ti2  hi  Ha 

0  0  0  0 

0  0  0  0 

\  0  0  0  0 


0  0  0 
0  0  0 
0  0  0 


h—2,n—2 
tn  —  l,n—2 

0 


In—  2,n— 1 
h,n— 1 


0 

tn—l,n  I 

hn  1 


4.  An  upper-triangular  matrix  U  is  one  for  which  Uij  =  0  if  z  >  j .  All  entries 
below  the  diagonal  are  zero: 


U  = 


u  12 
u22 


din  ^ 
din 


0 


dnn  J 


An  upper-triangular  matrix  is  unit  upper-triangular  if  it  has  all  Is  along  the 
diagonal. 

5.  A  lower-triangular  matrix  L  is  one  for  which  /,y  =  0  if  i  <  j .  All  entries 
above  the  diagonal  are  zero: 


L  = 


l  n  0 

^2i  hi 


0  \ 
0 


hil  hi  ■  ■  ■  hn  J 


A  lower-triangular  matrix  is  unit  lower-triangular  if  it  has  all  Is  along  the 
diagonal. 
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6.  A  permutation  matrix  P  has  exactly  one  1  in  each  row  or  column,  and  Os 
elsewhere.  An  example  of  a  permutation  matrix  is 


P  = 


/0  1  0  0 
0  0  0  1 
10  0  0 
0  0  0  0 
\0  0  1  0 


°\ 

0 

0 

1 

0/ 


Such  a  matrix  is  called  a  permutation  matrix  because  multiplying  a  vector  x 
by  a  permutation  matrix  has  the  effect  of  permuting  (rearranging)  the  elements 
of  x.  Exercise  D.l-4  explores  additional  properties  of  permutation  matrices. 

7.  A  symmetric  matrix  A  satisfies  the  condition  A  =  A1.  For  example, 


is  a  symmetric  matrix. 


Basic  matrix  operations 

The  elements  of  a  matrix  or  vector  are  numbers  from  a  number  system,  such  as 
the  real  numbers,  the  complex  numbers,  or  integers  modulo  a  prime.  The  number 
system  defines  how  to  add  and  multiply  numbers.  We  can  extend  these  definitions 
to  encompass  addition  and  multiplication  of  matrices. 

We  define  matrix  addition  as  follows.  If  A  =  (ay)  and  B  =  (by)  are  m  x  n 
matrices,  then  their  matrix  sum  C  =  (Cy )  =  A  +  B  is  the  m  x  n  matrix  defined  by 

Cjj  —  a  ij  +  bjj 

for  i  =  1,2 . m  and  j  =  1,2 . n.  That  is,  matrix  addition  is  performed 

componentwise.  A  zero  matrix  is  the  identity  for  matrix  addition: 

A+0  =  A  =  0  +  A. 

If  A  is  a  number  and  A  =  (ay)  is  a  matrix,  then  A  A  =  (A  a^)  is  the  scalar 
multiple  of  A  obtained  by  multiplying  each  of  its  elements  by  A.  As  a  special  case, 
we  define  the  negative  of  a  matrix  A  =  (a;j)  to  be  —1  ■  A  =  —A,  so  that  the  i j  th 
entry  of  —A  is  —ay.  Thus, 


A  +  (-A)  =  0=  ( -A )  +  A  . 
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We  use  the  negative  of  a  matrix  to  define  matrix  subtraction :  A  —  B  =  A  +  (-B). 

We  define  matrix  multiplication  as  follows.  We  staid  with  two  matrices  A  and  B 
that  are  compatible  in  the  sense  that  the  number  of  columns  of  A  equals  the  number 
of  rows  of  B.  (In  general,  an  expression  containing  a  matrix  product  AB  is  always 
assumed  to  imply  that  matrices  A  and  B  are  compatible.)  If  A  =  (a,^  )  is  an  in  x  n 
matrix  and  B  =  (bkj)  is  an  n  x  p  matrix,  then  their  matrix  product  C  =  AB  is  the 
in  x  p  matrix  C  =  (Cy),  where 


n 


(D.2) 


k=  1 


for  /  =  1,2,  ...,m  and  j  =  1,2  ,...,p.  The  procedure  Square-Matrix  - 
Multiply  in  Section  4.2  implements  matrix  multiplication  in  the  straightfor¬ 
ward  manner  based  on  equation  (D.2),  assuming  that  the  matrices  are  square: 
m  =  n  =  p.  To  multiply  n  x  n  matrices,  Square-Matrix-Multiply  per¬ 
forms  «3  multiplications  and  n2(n  —  1)  additions,  and  so  its  running  time  is  @(n3). 

Matrices  have  many  (but  not  all)  of  the  algebraic  properties  typical  of  numbers. 
Identity  matrices  are  identities  for  matrix  multiplication: 

ImA  =  AIn  =  A 

for  any  m  x  n  matrix  A.  Multiplying  by  a  zero  matrix  gives  a  zero  matrix: 


,40  =  0. 


Matrix  multiplication  is  associative: 

A(BC)  =  ( AB)C 

for  compatible  matrices  A,  B,  and  C.  Matrix  multiplication  distributes  over  addi¬ 
tion: 


A(B  +  C)  =  AB  +  AC  , 

( B  +  C)D  =  BD  +  CD. 

For  n  >  1,  multiplication  of  n  x  n  matrices  is  not  commutative.  For  example,  if 


and 
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We  define  matrix-vector  products  or  vector- vector  products  as  if  the  vector  were 
the  equivalent  n  x  1  matrix  (or  a  I  x  n  matrix,  in  the  case  of  a  row  vector).  Thus, 
if  A  is  an  in  x  n  matrix  and  x  is  an  n -vector,  then  Ax  is  an  m -vector.  If  x  and  y 
are  n -vectors,  then 

xTy  =  J^x/yi 

i= 1 

is  a  number  (actually  a  1  x  1  matrix)  called  the  inner  product  of  x  and  y.  The  ma¬ 
trix  xyT  is  an  n  x  n  matrix  Z  called  the  outer  product  of  x  and  y,  with  Zij  =  x,-  yy . 
The  (euclidean)  norm  ||x||  of  an  n -vector  x  is  defined  by 

||  a-  ||  =  (Xj  +  x\  A - h  x„2)1/2 

=  (xTx)1/2. 

Thus,  the  norm  of  x  is  its  length  in  n -dimensional  euclidean  space. 

Exercises 


D.l-1 

Show  that  if  A  and  B  are  symmetric  n  x  n  matrices,  then  so  are  A  +  B  and  A  —  B. 


D.l-2 

Prove  that  ( AB) T  =  BTAT  and  that  ATA  is  always  a  symmetric  matrix. 


D.l-3 

Prove  that  the  product  of  two  lower-triangular  matrices  is  lower-triangular. 


D.l-4 

Prove  that  if  P  is  an  n  x  n  permutation  matrix  and  A  is  an  n  x  n  matrix,  then  the 
matrix  product  PA  is  A  with  its  rows  permuted,  and  the  matrix  product  AP  is  A 
with  its  columns  permuted.  Prove  that  the  product  of  two  permutation  matrices  is 
a  permutation  matrix. 


D.2  Basic  matrix  properties 

In  this  section,  we  define  some  basic  properties  pertaining  to  matrices:  inverses, 
lineal-  dependence  and  independence,  rank,  and  determinants.  We  also  define  the 
class  of  positive-definite  matrices. 


D.2  Basic  matrix  properties 
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Matrix  inverses,  ranks,  and  determinants 

We  define  the  inverse  of  an  n  x  n  matrix  A  to  be  the  n  x  n  matrix,  denoted  A-1  (if 
it  exists),  such  that  A  A-1  =  /„  =  A~'  A.  For  example, 


Many  nonzero  n  x  n  matrices  do  not  have  inverses.  A  matrix  without  an  inverse  is 
called  noninvertible,  or  singular.  An  example  of  a  nonzero  singular  matrix  is 


If  a  matrix  has  an  inverse,  it  is  called  invertible ,  or  nonsingular.  Matrix  inverses, 
when  they  exist,  are  unique.  (See  Exercise  D.2-1.)  If  A  and  B  are  nonsingular 
n  x  n  matrices,  then 

(BA)-1  =  A^B-1  . 

The  inverse  operation  commutes  with  the  transpose  operation: 

(A-‘)t  =  (AT)_1  . 

The  vectors  Xi,x2 . x„  are  linearly  dependent  if  there  exist  coefficients 

Ci,  c2 _ _  cn,  not  all  of  which  are  zero,  such  that  c3x3  +  c2x2  +  •  •  •  +  cnxn  =  0. 

The  row  vectors  X\  =  (  1  2  3  ),  x2  =  (  2  6  4  ),  and  x3  =  (  4  11  9  )  are 
linearly  dependent,  for  example,  since  2x\  +  3x2  —  2x3  =  0.  If  vectors  are  not 
linearly  dependent,  they  are  linearly  independent.  For  example,  the  columns  of  an 
identity  matrix  are  linearly  independent. 

The  column  rank  of  a  nonzero  m  x  n  matrix  A  is  the  size  of  the  largest  set 
of  linearly  independent  columns  of  A.  Similarly,  the  row  rank  of  A  is  the  size 
of  the  largest  set  of  linearly  independent  rows  of  A.  A  fundamental  property  of 
any  matrix  A  is  that  its  row  rank  always  equals  its  column  rank,  so  that  we  can 
simply  refer  to  the  rank  of  A.  The  rank  of  an  m  x  n  matrix  is  an  integer  between  0 
and  min(/n,  n),  inclusive.  (The  rank  of  a  zero  matrix  is  0,  and  the  rank  of  an  n  x  n 
identity  matrix  is  n.)  An  alternate,  but  equivalent  and  often  more  useful,  definition 
is  that  the  rank  of  a  nonzero  m  x  n  matrix  A  is  the  smallest  number  r  such  that 
there  exist  matrices  B  and  C  of  respective  sizes  m  x  r  and  r  x  n  such  that 

A  =  BC  . 

A  square  n  x  n  matrix  has  full  rank  if  its  rank  is  n .  An  m  x  n  matrix  has  full 
column  rank  if  its  rank  is  n .  The  following  theorem  gives  a  fundamental  property 
of  ranks. 
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Theorem  D.l 

A  square  matrix  has  full  rank  if  and  only  if  it  is  nonsingular.  ■ 


A  null  vector  for  a  matrix  A  is  a  nonzero  vector  x  such  that  Ax  =  0.  The 
following  theorem  (whose  proof  is  left  as  Exercise  D.2-7)  and  its  corollary  relate 
the  notions  of  column  rank  and  singularity  to  null  vectors. 

Theorem  D.2 

A  matrix  A  has  full  column  rank  if  and  only  if  it  does  not  have  a  null  vector.  ■ 


Corollary  D.3 

A  square  matrix  A  is  singular  if  and  only  if  it  has  a  null  vector.  ■ 


The  i j  th  minor  of  an  n  x  n  matrix  A,  for  n  >  1 ,  is  the  (n  —  1 )  x  (n  —  1 )  matrix  A  [,yj 
obtained  by  deleting  the  /  th  row  and  j  th  column  of  A.  We  define  the  determinant 
of  an  n  x  n  matrix  A  recursively  in  terms  of  its  minors  by 

!a  1 1  if  n  =  1  , 

^(-l)1+yaly- det(A[ly])  ifn  >  1  . 

The  term  (— 1),+J'  det(A[yj)  is  known  as  the  cofactor  of  the  element  a,y. 

The  following  theorems,  whose  proofs  are  omitted  here,  express  fundamental 
properties  of  the  determinant. 

Theorem  D.4  (Determinant  properties) 

The  determinant  of  a  square  matrix  A  has  the  following  properties: 

•  If  any  row  or  any  column  of  A  is  zero,  then  det(A)  =  0. 

•  The  determinant  of  A  is  multiplied  by  A  if  the  entries  of  any  one  row  (or  any 
one  column)  of  A  are  all  multiplied  by  A. 

•  The  determinant  of  A  is  unchanged  if  the  entries  in  one  row  (respectively,  col¬ 
umn)  are  added  to  those  in  another  row  (respectively,  column). 

•  The  determinant  of  A  equals  the  determinant  of  AT. 

•  The  determinant  of  A  is  multiplied  by  —  1  if  any  two  rows  (or  any  two  columns) 
are  exchanged. 

Also,  for  any  square  matrices  A  and  B,  we  have  det(A5)  =  det(A)  det(5).  ■ 


D.2  Basic  matrix  properties 
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Theorem  D.5 

An  n  x  n  matrix  A  is  singular  if  and  only  if  det(ri)  =  0.  ■ 


Positive-definite  matrices 

Positive-definite  matrices  play  an  important  role  in  many  applications.  An  n  x  n 
matrix  A  is  positive-definite  if  x'Ax  >  0  for  all  //-vectors  x  f  0.  For 
example,  the  identity  matrix  is  positive-definite,  since  for  any  nonzero  vector 
x  =  (X!  x2  ■■■  xn)T, 

xTInx  =  xTx 

n 

=  ET 

i= 1 

>  0  . 

Matrices  that  arise  in  applications  are  often  positive-definite  due  to  the  following 
theorem. 

Theorem  D.6 

For  any  matrix  A  with  full  column  rank,  the  matrix  AT  A  is  positive-definite. 

Proof  We  must  show  that  xT(riTri)x  >  0  for  any  nonzero  vector  x.  For  any 
vector  x, 

xI(ATA)x  =  (Ax)T(rix)  (by  Exercise  D.  1-2) 

=  \\Ax\\2  ■ 

Note  that  ||  Ax\\2  is  just  the  sum  of  the  squares  of  the  elements  of  the  vector  Ax. 
Therefore,  ||rix||2  >  0.  If  ||rix||2  =  0,  every  element  of  Ax  is  0,  which  is  to  say 
Ax  =  0.  Since  A  has  full  column  rank,  Ax  =  0  implies  x  =  0,  by  Theorem  D.2. 
Hence,  AT A  is  positive-definite.  ■ 

Section  28.3  explores  other  properties  of  positive-definite  matrices. 

Exercises 


D.2-1 

Prove  that  matrix  inverses  are  unique,  that  is,  if  B  and  C  are  inverses  of  A,  then 
B  =  C. 


D.2 -2 

Prove  that  the  determinant  of  a  lower-triangular  or  upper-triangular  matrix  is  equal 
to  the  product  of  its  diagonal  elements.  Prove  that  the  inverse  of  a  lower-triangular 
matrix,  if  it  exists,  is  lower-triangular. 
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D.2-3 

Prove  that  if  P  is  a  permutation  matrix,  then  P  is  invertible,  its  inverse  is  PT, 
and  PT  is  a  permutation  matrix. 


D.2-4 

Let  A  and  B  be  n  x  n  matrices  such  that  AB  =  I .  Prove  that  if  A!  is  obtained 
from  A  by  adding  row  j  into  row  i ,  then  subtracting  column  i  from  column  j  of  B 
yields  the  inverse  B'  of  A! . 


D.2-5 

Let  A  be  a  nonsingular  n  x  n  matrix  with  complex  entries.  Show  that  every  entry 
of  A~x  is  real  if  and  only  if  every  entry  of  A  is  real. 


D.2-6 

Show  that  if  A  is  a  nonsingular,  symmetric,  n  x  n  matrix,  then  A~l  is  symmetric. 
Show  that  if  B  is  an  arbitrary  m  x  n  matrix,  then  the  m  x  m  matrix  given  by  the 
product  BAB 1  is  symmetric. 


D.2-7 

Prove  Theorem  D.2.  That  is,  show  that  a  matrix  A  has  full  column  rank  if  and  only 
if  Ax  =  0  implies  x  =  0.  {Hint:  Express  the  linear  dependence  of  one  column  on 
the  others  as  a  matrix-vector  equation.) 


D.2-8 

Prove  that  for  any  two  compatible  matrices  A  and  B, 
rank(j4S)  <  m i n ( ran k (A),  ran k (B))  , 

where  equality  holds  if  either  A  or  B  is  a  nonsingular  square  matrix.  {Hint:  Use 
the  alternate  definition  of  the  rank  of  a  matrix.) 


Problems 


D-l  Vandermonde  matrix 

Given  numbers  x0,  X\, . 

.  ,  X, 

,_i,  prove  that  the 

determinant  of  the  Vandermonde 

matrix 

Z1 

x0  xl 

K~l  \ 

V(x 0,Xi, . . .  ,x„_i)  = 

1 

1 

Xi  x\ 

x "-1 

Xn-1  ••• 

K-\ ) 
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is 

det(K  (x0,  X\,. . . ,  x„_i))  =  Y[  (xk-xj). 

0<j<k<n—l 

{Hint:  Multiply  column  i  by  —  x0  and  add  it  to  column  i  +  1  for  i  =  n  —  1, 
n  —  2, ...  A,  and  then  use  induction.) 

D-2  Permutations  defined  by  matrix-vector  multiplication  over  GF{2) 

One  class  of  peimutations  of  the  integers  in  the  set  Sn  =  {0,  1, 2, . . . ,  2"  —  1}  is 
defined  by  matrix  multiplication  over  GF{ 2).  For  each  integer  x  in  Sn,  we  view  its 
binary  representation  as  an  /7-bit  vector 


\  Xn—\  J 


where  x  =  YllZo  xi 2‘ .  If  A  is  an  n  x  n  matrix  in  which  each  entry  is  either  0 
or  1,  then  we  can  define  a  permutation  mapping  each  value  x  in  S„  to  the  number 
whose  binary  representation  is  the  matrix-vector  product  Ax.  Here,  we  perform 
all  arithmetic  over  GF( 2):  all  values  are  either  0  or  1,  and  with  one  exception  the 
usual  rules  of  addition  and  multiplication  apply.  The  exception  is  that  1  +  1  =  0. 
You  can  think  of  arithmetic  over  GF{ 2)  as  being  just  like  regular  integer  arithmetic, 
except  that  you  use  only  the  least  significant  bit. 

As  an  example,  for  S2  =  {0,  1, 2,  3},  the  matrix 


defines  the  following  permutation  nA:  nA  (0)  =  0,  nA{\)  —  3,  nA{ 2)  =  2, 
nA( 3)  =  1.  To  see  why  nA{ 3)  =  1,  observe  that,  working  in  GF{ 2), 

-<3)  =  (i  ?)(!) 
n-i  +  o-n 
l  i-i  +  i-  w 

-(*)• 

which  is  the  binary  representation  of  1 . 
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For  the  remainder  of  this  problem,  we  work  over  GF( 2),  and  all  matrix  and 
vector  entries  are  0  or  1.  We  define  the  rank  of  a  0-1  matrix  (a  matrix  for  which 
each  entry  is  either  0  or  1)  over  GF( 2)  the  same  as  for  a  regular  matrix,  but  with  all 
arithmetic  that  determines  1  i near  independence  performed  over  GF( 2).  We  define 
the  range  of  an  n  x  n  0-1  matrix  A  by 

R  (A)  =  {y  \  y  =  Ax  for  some  x  e  Sn }  , 

so  that  5(A)  is  the  set  of  numbers  in  S„  that  we  can  produce  by  multiplying  each 
value  x  in  S„  by  A. 

a.  If  r  is  the  rank  of  matrix  A,  prove  that  |  f?  ( A )  |  =  2r .  Conclude  that  A  defines  a 
permutation  on  S„  only  if  A  has  full  rank. 

For  a  given  n  x  n  matrix  A  and  a  given  value  y  e  R(A),  we  define  the  preimage 
of  y  by 

P(A,  y)  =  {x  :  Ax  =  y}  , 

so  that  P(A,  y)  is  the  set  of  values  in  Sn  that  map  to  y  when  multiplied  by  A. 

b.  If  r  is  the  rank  of  n  x  n  matrix  A  and  y  €  5(A),  prove  that  \P( A.  y)|  =  2"~r. 

Let  0  <  m  <  n,  and  suppose  we  partition  the  set  S„  into  blocks  of  consec¬ 
utive  numbers,  where  the  i  th  block  consists  of  the  2m  numbers  i  2m ,  i  2m  +  1 , 
i2m  +  2 +  l)2m  —  1.  For  any  subset  S  c  Sn,  define  B(Sjn)  to  be  the 
set  of  size-2m  blocks  of  Sn  containing  some  element  of  S.  As  an  example,  when 
n  =  3,  m  =  1,  and  S  =  {1, 4,  5},  then  5(5,  m)  consists  of  blocks  0  (since  1  is  in 
the  0th  block)  and  2  (since  both  4  and  5  are  in  block  2). 

c.  Let  r  be  the  rank  of  the  lower  left  (n  —  m)  x  m  submatrix  of  A,  that  is,  the 
matrix  formed  by  taking  the  intersection  of  the  bottom  n  —  m  rows  and  the 
leftmost  m  columns  of  A.  Let  5  be  any  size-2m  block  of  5„,  and  let  S'  = 
{y  :  y  =  Ax  for  some  x  e  5}.  Prove  that  |  B  ( S' .  m ) |  =  2r  and  that  for  each 
block  in  5(5',  m),  exactly  2m~r  numbers  in  5  map  to  that  block. 

Because  multiplying  the  zero  vector  by  any  matrix  yields  a  zero  vector,  the  set 
of  permutations  of  Sn  defined  by  multiplying  by  n  x  n  0- 1  matrices  with  full  rank 
over  GF( 2)  cannot  include  all  permutations  of  S„ .  Let  us  extend  the  class  of  per¬ 
mutations  defined  by  matrix-vector  multiplication  to  include  an  additive  term,  so 
that  x  €  S„  maps  to  Ax  +  c,  where  c  is  an  n-bit  vector  and  addition  is  performed 
over  GF( 2).  For  example,  when 
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and 


we  get  the  following  permutation  7ta,c'  Ka,c( 0)  =  2,  tta,c(\)  —  1,  ka,c( 2)  =  0, 
7T4jC(3)  =  3.  We  call  any  permutation  that  maps  x  €  S„  to  Ax  +  c,  for  some  n  x  n 
0-1  matrix  A  with  full  rank  and  some  n- bit  vector  c,  a  linear  permutation. 

d.  Use  a  counting  argument  to  show  that  the  number  of  linear  permutations  of  Sn 
is  much  less  than  the  number  of  permutations  of  Sn . 

e.  Give  an  example  of  a  value  of  n  and  a  permutation  of  Sn  that  cannot  be  achieved 
by  any  linear  permutation.  {Hint:  For  a  given  peimutation,  think  about  how 
multiplying  a  matrix  by  a  unit  vector  relates  to  the  columns  of  the  matrix.) 


Appendix  notes 

Linear-algebra  textbooks  provide  plenty  of  background  information  on  matrices. 
The  books  by  Strang  [323,  324]  are  particularly  good. 
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This  index  uses  the  following  conventions.  Numbers  are  alphabetized  as  if  spelled 
out;  for  example,  “2-3-4  tree”  is  indexed  as  if  it  were  “two-three-four  tree.”  When 
an  entry  refers  to  a  place  other  than  the  main  text,  the  page  number  is  followed  by 
a  tag:  ex.  for  exercise,  pr.  for  problem,  fig.  for  figure,  and  n.  for  footnote.  A  tagged 
page  number  often  indicates  the  first  page  of  an  exercise  or  problem,  which  is  not 
necessarily  the  page  on  which  the  reference  actually  appeal's. 


a(n),  574 

(f>  (golden  ratio),  59,  108  pr. 

<^>  (conjugate  of  the  golden  ratio),  59 

4>(n)  (Euler’s  phi  function),  943 

p(n )  approximation  algorithm,  1106,  1123 

o  notation,  50  5 1 ,  64 

O  notation,  45  fig.,  47  48,64 

O'  notation,  62  pr. 

O  notation,  62  pr. 
ft)  notation,  5 1 

£2  notation,  45  fig.,  48  49,  64 

oo 

£2  notation,  62  pr. 

£2  notation,  62  pr. 

<9  notation,  44  47,  45  fig.,  64 
©  notation,  62  pr. 

{  }  (set),  1158 
e  (set  member),  1158 
$  (not  a  set  member),  1158 
0 

(empty  language),  1058 
(empty  set),  1158 
C  (subset),  1 159 
C  (proper  subset),  1159 
:  (such  that),  1 159 
fl  (set  intersection),  1159 
U  (set  union),  1159 


—  (set  difference),  1 159 

II 

(flow  value),  7 10 
(length  of  a  string),  986 
(set  cardinality),  1161 
x 

(Cartesian  product),  1162 
(cross  product),  1016 

0 

(sequence),  1166 
(standard  encoding),  1057 
(£)  (choose),  1185 
||  ||  (euclidean  norm),  1222 
!  (factorial),  57 
f  ]  (ceiling),  54 
L  J  (floor),  54 

y  (lower  square  root),  546 
y  (upper  square  root),  546 
2^  (sum),  1145 
]”[  (product),  1 148 
—y  (adjacency  relation),  1 169 
(reachability  relation),  1 170 
A  (AND),  697,  1071 

-  (NOT),  1071 

V  (OR),  697,  1071 
©  (group  operator),  939 
g>  (convolution  operator),  901 
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*  (closure  operator),  1058 
|  (divides  relation),  927 
\  (does  not  divide  relation),  927 
=  (equivalent  modulo  n),  54,  1165  ex. 

^  (not  equivalent  modulo  «),  54 
[a]n  (equivalence  class  modulo  n ),  928 
+„  (addition  modulo  /;),  940 
■n  (multiplication  modulo  n),  940 
(j)  (Legendre  symbol),  982 pr. 
e  (empty  string),  986,  1058 
C  (prefix  relation),  986 
□  (suffix  relation),  986 
x  (above  relation),  1022 
//  (comment  symbol),  2 1 

(much  greater  than  relation),  574 
(much  less  than  relation),  783 
<P  (polynomial  time  reducibility  relation), 
1067,  1077  ex. 

AA  tree,  338 

abelian  group,  940 

Above,  1024 

above  relation  (^= x ),  1022 

absent  child,  1178 

absolutely  convergent  series,  1146 

absorption  laws  for  sets,  1 160 

abstract  problem,  1054 

acceptable  pair  of  integers,  972 

acceptance 

by  an  algorithm,  1058 
by  a  finite  automaton,  996 
accepting  state,  995 
accounting  method,  456  459 
for  binary  counters,  458 
for  dynamic  tables,  465  466 
for  stack  operations,  457  458,  458  ex. 
Ackermann’s  function,  585 
activity  selection  problem,  415  422,450 
acyclic  graph,  1170 

relation  to  matroids,  448  pr. 
add  instruction,  23 
addition 

of  binary  integers,  22  ex. 
of  matrices,  1220 
modulo  n  (+«),  940 
of  polynomials,  898 
additive  group  modulo  n ,  940 
addressing,  open,  see  open  address  hash  table 


Add  Subarray,  805 pr. 
adjacency  list  representation,  590 
replaced  by  a  hash  table,  593  ex. 
adjacency  matrix  representation,  591 
adjacency  relation  (— >),  1169 
adjacent  vertices,  1 169 
admissible  edge,  749 
admissible  network,  749  750 
adversary,  190 
aggregate  analysis,  452  456 
for  binary  counters,  454  455 
for  breadth  first  search,  597 
for  depth  first  search,  606 
for  Dijkstra’s  algorithm,  661 
for  disjoint  set  data  structures,  566  567, 

568  ex. 

for  dynamic  tables,  465 
for  Fibonacci  heaps,  518,  522 ex. 
for  Graham’s  scan,  1036 
for  the  Rnuth  Morris  Pratt  algorithm,  1006 
for  Prim’s  algorithm,  636 
for  rod  cutting,  367 
for  shortest  paths  in  a  dag,  655 
for  stack  operations,  452  454 
aggregate  flow,  863 

Akra  Bazzi  method  for  solving  a  recurrence, 
112  113 
algorithm,  5 

correctness  of,  6 
origin  of  word,  42 
running  time  of,  25 
as  a  technology,  13 
Alice,  959 

Allocate  Node, 492 
Allocate  Object,  244 
allocation  of  objects,  243  244 
all  pairs  shortest  paths,  644,  684  707 
in  dynamic  graphs,  707 
in  e  dense  graphs,  706  pr. 

Floyd  Warshall  algorithm  for,  693  697,  706 
Johnson’s  algorithm  for,  700  706 
by  matrix  multiplication,  686  693,706  707 
by  repeated  squaring,  689  69 1 
alphabet,  995,  1057 
a(n),  574 

amortized  analysis,  45 1  478 
accounting  method  of,  456  459 
aggregate  analysis,  367,  452  456 
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for  bit  reversal  permutation,  472  pr. 
for  breadth  first  search,  597 
for  depth  first  search,  606 
for  Dijkstra’s  algorithm,  661 
for  disjoint  set  data  structures,  566  567, 
568  ex.,  572 ex.,  575  581,581  582  ex. 
for  dynamic  tables,  463  471 
for  Fibonacci  heaps,  509  512,517  518, 
520  522,  522  ex. 

for  the  generic  push  relabel  algorithm,  746 
for  Graham’s  scan,  1036 
for  the  Knuth  Morris  Pratt  algorithm,  1006 
for  making  binary  search  dynamic,  473  pr. 
potential  method  of,  459  463 
for  restructuring  red  black  trees,  474  pr. 
for  self  organizing  lists  with  move  to  front, 
476  pr. 

for  shortest  paths  in  a  dag,  655 
for  stacks  on  secondary  storage,  502 pr. 
for  weight  balanced  trees,  473  pr. 
amortized  cost 

in  the  accounting  method,  456 
in  aggregate  analysis,  452 
in  the  potential  method,  459 
ancestor,  1176 

least  common,  584 pr. 

AND  function  (A),  697,  1071 

AND  gate,  1070 

and,  in  pseudocode,  22 

antiparallel  edges,  711  712 

antisymmetric  relation,  1164 

Any  Segments  Intersect,  1025 

approximation 

by  least  squares,  835  839 
of  summation  by  integrals,  1154  1156 
approximation  algorithm,  10,  1105  1140 
for  bin  packing,  1134  pr. 
for  MAX  CNF  satisfiability,  1 127  ex. 
for  maximum  clique,  1111  ex.,  1134 pr. 
for  maximum  matching,  1 135  pr. 
for  maximum  spanning  tree,  1137  pr. 
for  maximum  weight  cut,  1127  ex. 
for  MAX  3  CNF  satisfiability,  1123  1124, 
1139 

for  minimum  weight  vertex  cover, 

1124  1127, 1139 

for  parallel  machine  scheduling,  1 136pr. 
randomized,  1 123 


for  set  cover,  1117  1122,  1139 
for  subset  sum,  1 128  1134,1139 
for  traveling  salesman  problem,  1111  1117, 
1139 

for  vertex  cover,  1108  1111,  1139 
for  weighted  set  cover,  1135  pr. 
forO  1  knapsack  problem,  1137  pr.,  1139 
approximation  error,  836 
approximation  ratio,  1106,  1123 
approximation  scheme,  1 107 
Approx  Min  Weight  VC,  1126 
Approx  Subset  Sum,  1131 
Approx  TSP  Tour,  1112 
Approx  Vertex  Cover,  1109 
arbitrage,  679  pr. 
arc,  see  edge 

argument  of  a  function,  1166  1167 
arithmetic  instructions,  23 
arithmetic,  modular,  54,  939  946 
arithmetic  series,  1146 
arithmetic  with  infinities,  650 
arm,  485 
array,  21 

Monge,  llOpr. 
passing  as  a  parameter,  21 
articulation  point,  621  pr. 
assignment 
multiple,  2 1 
satisfying,  1072,  1079 
truth,  1072,  1079 
associative  laws  for  sets,  1160 
associative  operation,  939 
asymptotically  larger,  52 
asymptotically  nonnegative,  45 
asymptotically  positive,  45 
asymptotically  smaller,  52 
asymptotically  tight  bound,  45 
asymptotic  efficiency,  43 
asymptotic  lower  bound,  48 
asymptotic  notation,  43  53,  62  pr. 
and  graph  algorithms,  588 
and  linearity  of  summations,  1 146 
asymptotic  upper  bound,  47 
attribute  of  an  object,  21 
augmentation  of  a  flow,  716 
augmenting  data  structures,  339  355 
augmenting  path,  719  720,  763  pr. 
authentication,  284 pr.,  960  961,  964 
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automaton 
finite,  995 

string  matching,  996  1002 
auxiliary  hash  function,  272 
auxiliary  linear  program,  886 
average  case  running  time,  28,  1 16 
AVL  Insert,  333  pr. 

AVL  tree,  333  pr.,  337 
axioms,  for  probability,  1 190 

babyface,  602  ex. 
back  edge,  609,  613 
back  substitution,  817 
Bad  Set  Cover  Instance,  1122 ex. 
Balance,  333  pr. 
balanced  search  tree 
AA  trees,  338 
AVL  trees,  333  pr.,  337 
B  trees,  484  504 
k  neighbor  trees,  338 
red  black  trees,  308  338 
scapegoat  trees,  338 
splay  trees,  338,  482 
treaps,  333  pr.,  338 
2  3  4  trees,  489,  503  pr. 

2  3  trees,  337,  504 
weight  balanced  trees,  338,  473  pr. 
balls  and  bins,  133  134,  1215  pr. 
base  a  pseudoprime,  967 
base  case,  65,  84 
base,  in  DNA,  391 
basic  feasible  solution,  866 
basic  solution,  866 
basic  variable,  855 
basis  function,  835 
Bayes’s  theorem,  1 194 
Bellman  Ford,  651 
Bellman  Ford  algorithm,  651  655,  682 
for  all  pairs  shortest  paths,  684 
in  Johnson’s  algorithm,  702  704 
and  objective  functions,  670  ex. 
to  solve  systems  of  difference  constraints, 
668 

Yen’s  improvement  to,  678  pr. 

Below,  1024 
Bernoulli  trial,  1201 

and  balls  and  bins,  133  134 
and  streaks,  135  139 


best  case  running  time,  29  ex.,  49 
BFS,  595 

Biased  Random,  117 ex. 
biconnected  component,  62 1  pr. 
big  oh  notation,  45  fig.,  47  48,  64 
big  omega  notation,  45  fig. ,  48  49,  64 
bijective  function,  1167 
binary  character  code,  428 
binary  counter 

analyzed  by  accounting  method,  458 
analyzed  by  aggregate  analysis,  454  455 
analyzed  by  potential  method,  46 1  462 
bit  reversed,  472  pr. 
binary  entropy  function,  1187 
binary  gcd  algorithm,  981  pr. 
binary  heap,  see  heap 
binary  relation,  1163 
binary  search,  39  ex. 

with  fast  insertion,  473  pr. 
in  insertion  sort,  39  ex. 
in  multithreaded  merging,  799  800 
in  searching  B  trees,  499  ex. 

Binary  Search,  799 
binary  search  tree,  286  307 
AA  trees,  338 
AVL  trees,  333  pr.,  337 
deletion  from,  295  298,  299  ex. 
with  equal  keys,  303  pr. 
insertion  into,  294  295 
k  neighbor  trees,  338 
maximum  key  of,  291 
minimum  key  of,  29 1 
optimal,  397  404,  413 
predecessor  in,  291  292 
querying,  289  294 
randomly  built,  299  303,  304  pr. 
right  converting  of,  3 14  ex. 
scapegoat  trees,  338 
searching,  289  291 
for  sorting,  299  ex. 
splay  trees,  338 
successor  in,  291  292 
and  treaps,  333  pr. 
weight  balanced  trees,  338 
see  also  red  black  tree 
binary  search  tree  property,  287 
in  treaps,  333  pr. 
vs.  min  heap  property,  289  ex. 
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binary  tree,  1 177 
full,  1178 

number  of  different  ones,  306  pr. 
representation  of,  246 
superimposed  upon  a  bit  vector,  533  534 
see  also  binary  search  tree 
binomial  coefficient,  1 186  1187 
binomial  distribution,  1203  1206 
and  balls  and  bins,  133 
maximum  value  of,  1207  ex. 
tails  of,  1208  1215 
binomial  expansion,  1186 
binomial  heap,  527  pr. 
binomial  tree,  527  pr. 
bin  packing,  1134  pr. 
bipartite  graph,  1172 

corresponding  flow  network  of,  732 
d  regular,  736  ex. 
and  hypergraphs,  1173  ex. 
bipartite  matching,  530,  732  736,  747  ex.,  766 
Hopcroft  Karp  algorithm  for,  763  pr. 
birthday  paradox,  1 30  133,  142  ex. 
bisection  of  a  tree,  1181  pr. 
bitonic  euclidean  traveling  salesman  problem, 
405  pr. 

bitonic  sequence,  682  pr. 
bitonic  tour,  405  pr. 
bit  operation,  927 

in  Euclid’s  algorithm,  981  pr. 
bit  reversal  permutation,  472 pr.,  918 
Bit  Reverse  Copy,  918 
bit  reversed  binary  counter,  472  pr. 

Bit  Reversed  Increment, 472 pr. 

bit  vector,  255  ex.,  532  536 

black  height,  309 

black  vertex,  594,  603 

blocking  flow,  765 

block  structure  in  pseudocode,  20 

Bob,  959 

Boole’s  inequality,  1195  ex. 
boolean  combinational  circuit,  1071 
boolean  combinational  element,  1070 
boolean  connective,  1079 
boolean  formula,  1049,  1066ex.,  1079, 

1086  ex. 

boolean  function,  1187  ex. 
boolean  matrix  multiplication,  832  ex. 
Boruvka’s  algorithm,  641 


bottleneck  spanning  tree,  640  pr. 
bottleneck  traveling  salesman  problem, 

11 17  ex. 

bottom  of  a  stack,  233 
Bottom  Up  Cut  Rod,  366 
bottom  up  method,  for  dynamic  programming, 
365 
bound 

asymptotically  tight,  45 

asymptotic  lower,  48 

asymptotic  upper,  47 

on  binomial  coefficients,  1186  1187 

on  binomial  distributions,  1206 

polylogarithmic,  57 

on  the  tails  of  a  binomial  distribution, 

1208  1215 

see  also  lower  bounds 
boundary  condition,  in  a  recurrence,  67,  84 
boundary  of  a  polygon,  1020  ex. 
bounding  a  summation,  1 149  1156 
box,  nesting,  678  pr. 

B+  tree,  488 

branching  factor,  in  B  trees,  487 
branch  instructions,  23 
breadth  first  search,  594  602,  623 
in  maximum  flow,  727  730,  766 
and  shortest  paths,  597  600,  644 
similarity  to  Dijkstra’s  algorithm,  662, 

663  ex. 

breadth  first  tree,  594,  600 
bridge,  621  pr. 

B*  tree,  489  n. 

B  tree,  484  504 

compared  with  red  black  trees,  484,  490 
creating,  492 
deletion  from,  499  502 
full  node  in,  489 
height  of,  489  490 
insertion  into,  493  497 
minimum  degree  of,  489 
minimum  key  of,  497  ex. 
properties  of,  488  491 
searching,  491  492 
splitting  a  node  in,  493  495 
2  3  4  trees,  489 
B  Tree  Create, 492 
B  Tree  Delete,  499 
B  Tree  Insert,  495 
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B  Tree  Insert  Nonfull,  496 
B  Tree  Search,  492, 499  ex. 

B  Tree  Split  Child, 494 
BUBBLESORT,  40  pr. 
bucket,  200 
bucket  sort,  200  204 
Bucket  Sort,  201 
Build  Max  Heap,  157 
Build  Max  Heap',  167pr. 

Build  Min  Heap,  159 
butterfly  operation,  915 
by,  in  pseudocode,  21 

cache,  24,  449  pr. 
cache  hit,  449  pr. 
cache  miss,  449  pr. 
cache  obliviousness,  504 
caching,  off  line,  449  pr. 
call 

in  a  multithreaded  computation,  776 
of  a  subroutine,  23,  25  n. 
by  value,  2 1 
call  edge,  778 
cancellation  lemma,  907 
cancellation  of  flow,  717 
canonical  form  for  task  scheduling,  444 
capacity 
of  a  cut,  721 
of  an  edge,  709 
residual,  716,  719 
of  a  vertex,  7 14  ex. 
capacity  constraint,  709  710 
cardinality  of  a  set  (|  |),  1161 
Carmichael  number,  968,  975  ex. 
Cartesian  product  (x),  1162 
Cartesian  sum,  906  ex. 
cascading  cut,  520 
Cascading  Cut,  519 
Catalan  numbers,  306 pr.,  372 
ceiling  function  ([  ]),  54 
in  master  theorem,  103  106 
ceiling  instruction,  23 
certain  event,  1190 
certificate 

in  a  cryptosystem,  964 
for  verification  algorithms,  1063 
Chained  Hash  Delete,  258 
Chained  Hash  Insert,  258 


Chained  Hash  Search,  258 
chaining,  257  260,  283  pr. 
chain  of  a  convex  hull,  1038 
changing  a  key,  in  a  Fibonacci  heap,  529  pr. 
changing  variables,  in  the  substitution  method, 
86  87 

character  code,  428 

chess  playing  program,  790  791 

child 

in  a  binary  tree,  1 178 
in  a  multithreaded  computation,  776 
in  a  rooted  tree,  1 176 
child  list  in  a  Fibonacci  heap,  507 
Chinese  remainder  theorem,  950  954,  983 
chip  multiprocessor,  772 
chirp  transform,  9 14  ex. 
choose  (£),  1185 
chord,  345  ex. 

Cilk,  774,  812 
Cilk++,  774,  812 
ciphertext,  960 
circuit 

boolean  combinational,  1071 
depth  of,  919 

for  fast  Fourier  transform,  919  920 
CIRCUIT  SAT,  1072 
circuit  satisfiability,  1070  1077 
circular,  doubly  linked  list  with  a  sentinel,  239 
circular  linked  list,  236 
see  also  linked  list 
class 

complexity,  1059 
equivalence,  1164 
classification  of  edges 

in  breadth  first  search,  62 1  pr. 
in  depth  first  search,  609  610,  61 1  ex. 
in  a  multithreaded  dag,  778  779 
clause,  1081  1082 
clean  area,  208  pr. 
clique,  1086  1089,  1 105 

approximation  algorithm  for,  1111  ex., 

1 134pr. 

CLIQUE,  1087 

closed  interval,  348 

closed  semiring,  707 

closest  pair,  finding,  1039  1044,  1047 

closest  point  heuristic,  1117  ex. 
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closure 

group  property,  939 
of  a  language,  1058 
operator  (*),  1058 
transitive,  see  transitive  closure 
cluster 

in  a  bit  vector  with  a  superimposed  tree  of 
constant  height,  534 
for  parallel  computing,  772 
in  proto  van  Emde  Boas  structures,  538 
in  van  Emde  Boas  trees,  546 
clustering,  272 

CNF  (conjunctive  normal  form),  1049,  1082 
CNF  satisfiability,  1127  ex. 
coarsening  leaves  of  recursion 
in  merge  sort,  39  pr. 
when  recursively  spawning,  787 
code,  428  429 
Huffman,  428  437,450 
codeword,  429 
codomain,  1166 
coefficient 

binomial,  1186 
of  a  polynomial,  55,  898 
in  slack  form,  856 
coefficient  representation,  900 
and  fast  multiplication,  903  905 
cofactor,  1224 
coin  changing,  446  pr. 
colinearity,  1016 
collision,  257 

resolution  by  chaining,  257  260 
resolution  by  open  addressing,  269  277 
collision  resistant  hash  function,  964 
coloring,  1103pr.,  1180pr. 
color,  of  a  red  black  tree  node,  308 
column  major  order,  208  pr. 
column  rank,  1223 
columnsort,  208  pr. 
column  vector,  1218 
combination,  1185 
combinational  circuit,  1071 
combinational  element,  1070 
combine  step,  in  divide  and  conquer,  30,  65 
comment,  in  pseudocode  (//),  21 
commodity,  862 
common  divisor,  929 

greatest,  see  greatest  common  divisor 


common  multiple,  939  ex. 
common  subexpression,  915 
common  subsequence,  7,  391 
longest,  7,  390  397,413 
commutative  laws  for  sets,  1159 
commutative  operation,  940 
Compactify  List,  245  ex. 
compact  list,  250  pr. 

Compact  List  Search, 250 pr. 
Compact  List  Search',  251  pr. 
comparable  line  segments,  1022 
Compare  Exchange,  208 pr. 
compare  exchange  operation,  208  pr. 
comparison  sort,  191 

and  binary  search  trees,  289  ex. 
randomized,  205  pr. 
and  selection,  222 
compatible  activities,  415 
compatible  matrices,  371,  1221 
competitive  analysis,  476  pr. 
complement 

of  an  event,  1190 
of  a  graph,  1090 
of  a  language,  1058 
Schur,  820,  834 
of  a  set,  1160 

complementary  slackness,  894  pr. 
complete  graph,  1172 
complete  k  ary  tree,  1179 
see  also  heap 

completeness  of  a  language,  1077  ex. 
complete  step,  782 
completion  time,  447  pr.,  1136pr. 
complexity  class,  1059 
co  NP,  1064 
NP,  1049,  1064 
NPC,  1050,  1069 
P,  1049,  1055 
complexity  measure,  1059 
complex  numbers 

inverting  matrices  of,  832  ex. 
multiplication  of,  83  ex. 
complex  root  of  unity,  906 
interpolation  at,  9 1 2  913 
component 

biconnected,  62 1  pr. 
connected,  1170 
strongly  connected,  1170 
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component  graph,  617 
composite  number,  928 
witness  to,  968 

composition,  of  multithreaded  computations, 
784  fig. 

computational  depth,  812 
computational  geometry,  1014  1047 
computational  problem,  5  6 
computation  dag,  777 
computation,  multithreaded,  777 
Compute  Prefix  Function,  1006 
Compute  Transition  Function,  1001 
concatenation 

of  languages,  1058 
of  strings,  986 
concrete  problem,  1055 
concurrency  keywords,  774,  776,  785 
concurrency  platform,  773 
conditional  branch  instruction,  23 
conditional  independence,  1195  ex. 
conditional  probability,  1 192,  1 194 
conbguration,  1074 
conjugate  of  the  golden  ratio  ( ip ),  59 
conjugate  transpose,  832  ex. 
conjunctive  normal  form,  1049,  1082 
connected  component,  1170 

identibed  using  depth  first  search,  612  ex. 
identified  using  disjoint  set  data  structures, 
562  564 

Connected  Components,  563 
connected  graph,  1170 
connective,  1079 
co  NP  (complexity  class),  1064 
conquer  step,  in  divide  and  conquer,  30,  65 
conservation  of  flow,  709  7 10 
consistency 
of  literals,  1088 
sequential,  779,  812 
Consolidate,  516 
consolidating  a  Fibonacci  heap  root  list, 

513  517 
constraint,  851 
difference,  665 
equality,  670  ex.,  852  853 
inequality,  852  853 
linear,  846 

nonnegativity,  851,  853 
tight,  865 


violation  of,  865 
constraint  graph,  666  668 
contain,  in  a  path,  1170 
continuation  edge,  778 
continuous  uniform  probability  distribution, 
1192 

contraction 

of  a  dynamic  table,  467  47 1 
of  a  matroid,  442 

of  an  undirected  graph  by  an  edge,  1172 
control  instructions,  23 
convergence  property,  650,  672  673 
convergent  series,  1 146 
converting  binary  to  decimal,  933  ex. 
convex  combination  of  points,  1015 
convex  function,  1199 
convex  hull,  8,  1029  1039,  1046pr. 
convex  layers,  1044pr. 
convex  polygon,  1020  ex. 
convex  set,  7 14  ex. 
convolution  (®),  901 
convolution  theorem,  913 
copy  instruction,  23 
correctness  of  an  algorithm,  6 
corresponding  flow  network  for  bipartite 
matching,  732 
countably  infinite  set,  1161 
counter,  see  binary  counter 
counting,  1183  1189 
probabilistic,  143  pr. 
counting  sort,  194  197 
in  radix  sort,  198 
Counting  Sort,  195 
coupon  collector’s  problem,  134 
cover 

path,  761  pr. 
by  a  subset,  1118 

vertex,  1089,  1108,  1124  1127,  1139 
covertical,  1024 

Create  New  RS  vEB  Tree,  557 pr. 
credit,  456 
critical  edge,  729 
critical  path 
of  a  dag,  657 

of  a  multithreaded  computation,  779 
cross  a  cut,  626 
cross  edge,  609 
cross  product  (x),  1016 
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cryptosystem,  958  965,  983 

cubic  spline,  840  pr. 

currency  exchange,  390  ex.,  679  pr. 

curve  fitting,  835  839 

cut 

capacity  of,  721 

cascading,  520 

of  a  flow  network,  720  724 

minimum,  721,  731  ex. 

net  flow  across,  720 

of  an  undirected  graph,  626 

weight  of,  1127  ex. 

CUT,  519 
Cut  Rod,  363 

cutting,  in  a  Fibonacci  heap,  519 
cycle  of  a  graph,  1170 
hamiltonian,  1049,  1061 
minimum  mean  weight,  680  pr. 
negative  weight,  see  negative  weight  cycle 
and  shortest  paths,  646  647 
cyclic  group,  955 
cyclic  rotation,  10 12  ex. 
cycling,  of  simplex  algorithm,  875 

dag,  see  directed  acyclic  graph 
Dag  Shortest  Paths,  655 
d  ary  heap,  167  pr. 

in  shortest  paths  algorithms,  706  pr. 
data  movement  instructions,  23 
data  parallel  model,  811 
data  structure,  9,  229  355,481  585 
AA  trees,  338 
augmentation  of,  339  355 
AVL  trees,  333  pr.,  337 
binary  search  trees,  286  307 
binomial  heaps,  527  pr. 
bit  vectors,  255  ex.,  532  536 
B  trees,  484  504 
deques,  236  ex. 
dictionaries,  229 
direct  address  tables,  254  255 
for  disjoint  sets,  561  585 
for  dynamic  graphs,  483 
dynamic  sets,  229  231 
dynamic  trees,  482 
exponential  search  trees,  212,  483 
Fibonacci  heaps,  505  530 
fusion  trees,  212,  483 


hash  tables,  256  261 

heaps,  151  169 

interval  trees,  348  354 

k  neighbor  trees,  338 

linked  lists,  236  241 

mergeable  heap,  505 

order  statistic  trees,  339  345 

persistent,  331  pr.,  482 

potential  of,  459 

priority  queues,  162  166 

proto  van  Ernde  Boas  structures,  538  545 

queues,  232,  234  235 

radix  trees,  304  pr. 

red  black  trees,  308  338 

relaxed  heaps,  530 

rooted  trees,  246  249 

scapegoat  trees,  338 

on  secondary  storage,  484  487 

skip  lists,  338 

splay  trees,  338,  482 

stacks,  232  233 

treaps,  333  pr.,  338 

2  3  4  heaps,  529  pr. 

2  3  4  trees,  489,  503  pr. 

2  3  trees,  337,  504 
van  Emde  Boas  trees,  531  560 
weight  balanced  trees,  338 
data  type,  23 
deadline,  444 

deallocation  of  objects,  243  244 
decision  by  an  algorithm,  1058  1059 
decision  problem,  1051,  1054 
and  optimization  problems,  105 1 
decision  tree,  192  193 
Decrease  Key,  162, 505 
decreasing  a  key 

in  Fibonacci  heaps,  519  522 
in  2  3  4  heaps,  529  pr. 

Decrement,  456  ex. 
degeneracy,  874 
degree 

of  a  binomial  tree  root,  527  pr. 
maximum,  of  a  Fibonacci  heap,  509, 

523  526 

minimum,  of  a  B  tree,  489 
of  a  node,  1177 
of  a  polynomial,  55,  898 
of  a  vertex,  1169 
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degree  bound,  898 

Delete,  230, 505 

Delete  Larger  Half, 463  ex. 

deletion 

from  binary  search  trees,  295  298,  299  ex. 
from  a  bit  vector  with  a  superimposed  binary 
tree,  534 

from  a  bit  vector  with  a  superimposed  tree  of 
constant  height,  535 
from  B  trees,  499  502 
from  chained  hash  tables,  258 
from  direct  address  tables,  254 
from  dynamic  tables,  467  471 
from  Fibonacci  heaps,  522,  526 pr. 
from  heaps,  166  ex. 
from  interval  trees,  349 
from  linked  lists,  238 
from  open  address  hash  tables,  27 1 
from  order  statistic  trees,  343  344 
from  proto  van  Emde  Boas  structures,  544 
from  queues,  234 
from  red  black  trees,  323  330 
from  stacks,  232 
from  sweep  line  statuses,  1024 
from  2  3  4  heaps,  529  pr. 
from  van  Emde  Boas  trees,  554  556 
DeMorgan’s  laws 

for  propositional  logic,  1083 
for  sets,  1160,  1162  ex. 
dense  graph,  589 
e  dense,  706  pr. 
density 

of  prime  numbers,  965  966 
of  a  rod,  370  ex. 
dependence 

and  indicator  random  variables,  119 
linear,  1223 
see  also  independence 
depth 

average,  of  a  node  in  a  randomly  built  binary 
search  tree,  304 pr. 
of  a  circuit,  919 
of  a  node  in  a  rooted  tree,  1 177 
of  quicksort  recursion  tree,  178  ex. 
of  a  stack,  188  pr. 

depth  determination  problem,  583  pr. 

depth  first  forest,  603 

depth  first  search,  603  612,  623 


in  finding  articulation  points,  bridges,  and 
biconnected  components,  621  pr. 
in  finding  strongly  connected  components, 
615  621,623 

in  topological  sorting,  612  615 
depth  first  tree,  603 
deque,  236  ex. 

Dequeue,  235 
derivative  of  a  series,  1 147 
descendant,  1176 
destination  vertex,  644 
det,  see  determinant 
determinacy  race,  788 
determinant,  1224  1225 

and  matrix  multiplication,  832  ex. 
deterministic  algorithm,  123 
multithreaded,  787 
Deterministic  Search,  143  pr. 

DFS,  604 
DFS  VISIT,  604 

DFT  (discrete  Fourier  transform),  9,  909 
diagonal  matrix,  1218 

LUP  decomposition  of,  827  ex. 
diameter  of  a  tree,  602  ex. 
dictionary,  229 

difference  constraints,  664  670 
difference  equation,  see  recurrence 
difference  of  sets  (— ),  1159 
symmetric,  763  pr. 
differentiation  of  a  series,  1 147 
digital  signature,  960 
digraph,  see  directed  graph 
DlJKSTRA,  658 

Dijkstra’s  algorithm,  658  664,  682 
for  all  pairs  shortest  paths,  684,  704 
implemented  with  a  Fibonacci  heap,  662 
implemented  with  a  min  heap,  662 
with  integer  edge  weights,  664  ex. 
in  Johnson’s  algorithm,  702 
similarity  to  breadth  first  search,  662, 

663  ex. 

similarity  to  Prim’s  algorithm,  634,  662 
Direct  Address  Delete,  254 
direct  addressing,  254  255,  532  536 
Direct  Address  Insert,  254 
Direct  Address  Search,  254 
direct  address  table,  254  255 
directed  acyclic  graph  (dag),  1172 
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and  back  edges,  613 
and  component  graphs,  617 
and  hamiltonian  paths,  1066  ex. 
longest  simple  path  in,  404  pr. 
for  representing  a  multithreaded 
computation,  777 

single  source  shortest  paths  algorithm  for, 
655  658 

topological  sort  of,  612  615,623 
directed  graph,  1168 

all  pairs  shortest  paths  in,  684  707 
constraint  graph,  666 
Euler  tour  of,  623  pr.,  1048 
hamiltonian  cycle  of,  1049 
and  longest  paths,  1048 
path  cover  of,  761  pr. 

PERT  chart,  657,  657  ex. 
semiconnected,  621  ex. 
shortest  path  in,  643 
single  source  shortest  paths  in,  643  683 
singly  connected,  612  ex. 
square  of,  593  ex. 
transitive  closure  of,  697 
transpose  of,  592  ex. 
universal  sink  in,  593  ex. 
see  also  directed  acyclic  graph,  graph, 
network 

directed  segment,  1015  1017 
directed  version  of  an  undirected  graph,  1172 
Direction,  1018 
dirty  area,  208  pr. 

Discharge,  751 

discharge  of  an  overflowing  vertex,  751 
discovered  vertex,  594,  603 
discovery  time,  in  depth  first  search,  605 
discrete  Fourier  transform,  9,  909 
discrete  logarithm,  955 
discrete  logarithm  theorem,  955 
discrete  probability  distribution,  1191 
discrete  random  variable,  1196  1201 
disjoint  set  data  structure,  561  585 
analysis  of,  575  581,  581  ex. 
in  connected  components,  562  564 
in  depth  determination,  583  pr. 
disjoint  set  forest  implementation  of, 

568  572 

in  Kruskal’s  algorithm,  631 
linear  time  special  case  of,  585 


linked  list  implementation  of,  564  568 
in  off  line  least  common  ancestors,  584  pr. 
in  off  line  minimum,  582  pr. 
in  task  scheduling,  448  pr. 
disjoint  set  forest,  568  572 
analysis  of,  575  581,  581  ex. 
rank  properties  of,  575,  58 1  ex. 
see  also  disjoint  set  data  structure 
disjoint  sets,  1 161 
disjunctive  normal  form,  1083 
disk,  1028  ex. 
disk  drive,  485  487 

see  also  secondary  storage 
Disk  Read,  487 
Disk  Write,  487 
distance 
edit,  406  pr. 
euclidean,  1039 
Lm,  1044  ex. 

Manhattan,  225  pr.,  1044  ex. 
of  a  shortest  path,  597 
distributed  memory,  772 
distribution 

binomial,  1203  1206 
continuous  uniform,  1192 
discrete,  1191 
geometric,  1202  1203 
of  inputs,  116,  122 
of  prime  numbers,  965 
probability,  1190 
sparse  hulled,  1046pr. 
uniform,  1191 

distributive  laws  for  sets,  1 160 
divergent  series,  1 146 
divide  and  conquer  method,  30  35,  65 
analysis  of,  34  35 
for  binary  search,  39  ex. 
for  conversion  of  binary  to  decimal,  933  ex. 
for  fast  Fourier  transform,  909  912 
for  finding  the  closest  pair  of  points, 

1040  1043 

for  finding  the  convex  hull,  1030 

for  matrix  inversion,  829  831 

for  matrix  multiplication,  76  83,  792  797 

for  maximum  subarray  problem,  68  75 

for  merge  sort,  30  37,  797  805 

for  multiplication,  920  pr. 


1262 


Index 


for  multithreaded  matrix  multiplication, 
792  797 

for  multithreaded  merge  sort,  797  805 
for  quicksort,  170  190 
relation  to  dynamic  programming,  359 
for  selection,  215  224 
solving  recurrences  for,  83  106,112  113 
for  Strassen’s  algorithm,  79  83 
divide  instruction,  23 
divides  relation  (|),  927 
divide  step,  in  divide  and  conquer,  30,  65 
division  method,  263,  268  269  ex. 
division  theorem,  928 
divisor,  927  928 
common,  929 

see  also  greatest  common  divisor 
DNA,  6  7,390  391,  406  pr. 

DNF  (disjunctive  normal  form),  1083 
does  not  divide  relation  (j),  927 
domain,  1166 

dominates  relation,  1045  pr. 
double  hashing,  272  274,  277  ex. 
doubly  linked  list,  236 
see  also  linked  list 
downto,  in  pseudocode,  21 
d  regular  graph,  136  ex. 
duality,  879  886,  895  pr. 

weak,  880  881,  886ex. 
dual  linear  program,  879 
dummy  key,  397 
dynamic  graph,  562  n. 

all  pairs  shortest  paths  algorithms  for,  707 

data  structures  for,  483 

minimum  spanning  tree  algorithm  for, 

637  ex. 

transitive  closure  of,  705  pr.,  707 
dynamic  multithreaded  algorithm,  see 
multithreaded  algorithm 
dynamic  multithreading,  773 
dynamic  order  statistics,  339  345 
dynamic  programming  method,  359  413 
for  activity  selection,  421  ex. 
for  all  pairs  shortest  paths,  686  697 
for  bitonic  euclidean  traveling  salesman 
problem,  405  pr. 
bottom  up,  365 
for  breaking  a  string,  410  pr. 


compared  with  greedy  algorithms,  381, 

390  ex.,  418,423  427 
for  edit  distance,  406  pr. 
elements  of,  378  390 
for  Floyd  Warshall  algorithm,  693  697 
for  inventory  planning,  41 1  pr. 
for  longest  common  subsequence,  390  397 
for  longest  palindrome  subsequence,  405  pr. 
for  longest  simple  path  in  a  weighted 
directed  acyclic  graph,  404  pr. 
for  matrix  chain  multiplication,  370  378 
and  memoization,  387  389 
for  optimal  binary  search  trees,  397  404 
optimal  substructure  in,  379  384 
overlapping  subproblems  in,  384  386 
for  printing  neatly,  405  pr. 
reconstructing  an  optimal  solution  in,  387 
relation  to  divide  and  conquer,  359 
for  rod  cutting,  360  370 
for  seam  carving,  409  pr. 
for  signing  free  agents,  41 1  pr. 
top  down  with  memoization,  365 
for  transitive  closure,  697  699 
for  Viterbi  algorithm,  408  pr. 
for  0  1  knapsack  problem,  427  ex. 
dynamic  set,  229  23 1 
see  also  data  structure 
dynamic  table,  463  47 1 

analyzed  by  accounting  method,  465  466 
analyzed  by  aggregate  analysis,  465 
analyzed  by  potential  method,  466  47 1 
load  factor  of,  463 
dynamic  tree,  482 

e,  55 

E  [  ]  (expected  value),  1197 
early  first  form,  444 
early  task,  444 
edge,  1168 

admissible,  749 
antiparallel,  711  712 
attributes  of,  592 
back,  609 
bridge,  621  pr. 
call,  778 
capacity  of,  709 

classification  in  breadth  first  search,  62 1  pr. 
classification  in  depth  first  search,  609  610 
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continuation,  778 
critical,  729 
cross,  609 
forward,  609 
inadmissible,  749 
light,  626 

negative  weight,  645  646 
residual,  716 
return,  779 
safe,  626 
saturated,  739 
spawn,  778 
tree,  601,  603,  609 
weight  of,  59 1 
edge  connectivity,  73 1  ex. 
edge  set,  1168 
edit  distance,  406  pr. 

Edmonds  Karp  algorithm,  727  730 

elementary  event,  1189 

elementary  insertion,  465 

element  of  a  set  (e),  1158 

ellipsoid  algorithm,  850,  897 

elliptic  curve  factorization  method,  984 

elseif,  in  pseudocode,  20  n. 

else,  in  pseudocode,  20 

empty  language  (0),  1058 

empty  set  (0),  1158 

empty  set  laws,  1159 

empty  stack,  233 

empty  string  (e),  986,  1058 

empty  tree,  1 178 

encoding  of  problem  instances,  1055  1057 
endpoint 

of  an  interval,  348 
of  a  line  segment,  1015 
Enqueue,  235 
entering  a  vertex,  1169 
entering  variable,  867 
entropy  function,  1187 
e  dense  graph,  706  pr. 
e  universal  hash  function,  269  ex. 
equality 

of  functions,  1166 
linear,  845 
of  sets,  1 158 

equality  constraint,  670  ex.,  852 
and  inequality  constraints,  853 
tight,  865 


violation  of,  865 
equation 

and  asymptotic  notation,  49  50 
normal,  837 

recurrence,  see  recurrence 
equivalence  class,  1164 
modulo  n  ([fl]n),  928 
equivalence,  modular  (=),  54,  1 165  ex. 
equivalence  relation,  1164 

and  modular  equivalence,  1165  ex. 
equivalent  linear  programs,  852 
error,  in  pseudocode,  22 
escape  problem,  760  pr. 

Euclid,  935 

Euclid’s  algorithm,  933  939,  981  pr.,  983 

euclidean  distance,  1039 

euclidean  norm  (||  ||),  1222 

Euler’s  constant,  943 

Euler’s  phi  function,  943 

Euler’s  theorem,  954,  975  ex. 

Euler  tour,  623  pr.,  1048 

and  hamiltonian  cycles,  1048 
evaluation  of  a  polynomial,  41  pr.,  900,  905  ex. 
derivatives  of,  922  pr. 
at  multiple  points,  923  pr. 
event,  1190 
event  point,  1023 
event  point  schedule,  1023 
Exact  Subset  Sum,  1129 
excess  flow,  736 
exchange  property,  437 
exclusion  and  inclusion,  1163  ex. 
execute  a  subroutine,  25  n. 
expansion  of  a  dynamic  table,  464  467 
expectation,  see  expected  value 
expected  running  time,  28,  117 
expected  value,  1197  1199 
of  a  binomial  distribution,  1204 
of  a  geometric  distribution,  1202 
of  an  indicator  random  variable,  118 
explored  vertex,  605 
exponential  function,  55  56 
exponential  height,  300 
exponential  search  tree,  212,  483 
exponential  series,  1147 
exponentiation  instruction,  24 
exponentiation,  modular,  956 
Extended  Bottom  Up  Cut  Rod,  369 
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Extended  Euclid,  937 
Extend  Shortest  Paths,  688 
extension  of  a  set,  438 
exterior  of  a  polygon,  1020  ex. 
external  node,  1176 
external  path  length,  1 180  ex. 
extracting  the  maximum  key 
from  d  ary  heaps,  167  pr. 
from  max  heaps,  163 
extracting  the  minimum  key 
from  Fibonacci  heaps,  512  518 
from  2  3  4  heaps,  529  pr. 
from  Young  tableaus,  167  pr. 

Extract  Max,  162  163 
Extract  Min,  162, 505 

factor,  928 
twiddle,  912 

factorial  function  (!),  57  58 
factorization,  975  980,  984 
unique,  93 1 

failure,  in  a  Bernoulli  trial,  1201 
fair  coin,  1191 
fan  out,  1071 
Farkas’s  lemma,  895  pr. 
farthest  pair  problem,  1030 
Faster  All  Pairs  Shortest  Paths, 691, 
692  ex. 

fast  Fourier  transform  (FFT),  898  925 
circuit  for,  919  920 
iterative  implementation  of,  915  918 
multidimensional,  92 1  pr. 
multithreaded  algorithm  for,  804  ex. 
recursive  implementation  of,  909  912 
using  modular  arithmetic,  923  pr. 
feasibility  problem,  665,  894  pr. 
feasible  linear  program,  85 1 
feasible  region,  847 
feasible  solution,  665,  846,  851 
Fermat’s  theorem,  954 
FFT,  see  fast  Fourier  transform 
FFTW,  924 
FIB, 775 

Fib  Heap  Change  Key,  529  pr. 

Fib  Heap  Decrease  Key,  5 19 
Fib  Heap  Delete,  522 
Fib  Heap  Extract  Min,  513 
Fib  Heap  Insert,  510 


Fib  Heap  Link,  516 
Fib  Heap  Prune,  529  pr. 

Fib  Heap  Union,  5 12 
Fibonacci  heap,  505  530 
changing  a  key  in,  529  pr. 
compared  with  binary  heaps,  506  507 
creating,  510 

decreasing  a  key  in,  5 19  522 

deletion  from,  522,  526  pr. 

in  Dijkstra’s  algorithm,  662 

extracting  the  minimum  key  from,  512  518 

insertion  into,  510  511 

in  Johnson’s  algorithm,  704 

maximum  degree  of,  509,  523  526 

minimum  key  of,  5 1 1 

potential  function  for,  509 

in  Prim’s  algorithm,  636 

pruning,  529  pr. 

running  times  of  operations  on,  506  fig. 
uniting,  511  512 

Fibonacci  numbers,  59  60,  108  pr.,  523 
computation  of,  774  780,  981  pr. 

FIFO  (first  in,  first  out),  232 
see  also  queue 
final  state  function,  996 
final  strand,  779 
Find  Depth,  583  pr. 

Find  Max  Crossing  Subarray, 71 
Find  Maximum  Subarray,  72 
find  path,  569 
Find  Set,  562 

disjoint  set  forest  implementation  of,  571, 
585 

linked  list  implementation  of,  564 
finished  vertex,  603 

finishing  time,  in  depth  first  search,  605 
and  strongly  connected  components,  618 
finish  time,  in  activity  selection,  415 
finite  automaton,  995 

for  string  matching,  996  1002 
Finite  Automaton  Matcher,  999 
finite  group,  940 
finite  sequence,  1166 
finite  set,  1161 
first  fit  heuristic,  1134  pr. 
first  in,  first  out,  232 
see  also  queue 
fixed  length  code,  429 
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floating  point  data  type,  23 
floor  function  ( |_  J),  54 

in  master  theorem,  103  106 
floor  instruction,  23 
flow,  709  714 
aggregate,  863 
augmentation  of,  716 
blocking,  765 
cancellation  of,  717 
excess,  736 
integer  valued,  733 
net,  across  a  cut,  720 
value  of,  7 10 

flow  conservation,  709  710 
flow  network,  709  7 14 

corresponding  to  a  bipartite  graph,  732 
cut  of,  720  724 

with  multiple  sources  and  sinks,  712 
Floyd  Warshall,695 
Floyd  Warshall',  699  ex. 

Floyd  Warshall  algorithm,  693  697, 

699  700ex„  706 
multithreaded,  797  ex. 

Ford  Fulkerson,  724 
Ford  Fulkerson  method,  714  731,  765 
Ford  Fulkerson  Method,  715 
forest,  1172  1173 
depth  first,  603 
disjoint  set,  568  572 
for,  in  pseudocode,  20  21 
and  loop  invariants,  19n. 
formal  power  series,  108  pr. 
formula  satisfiability,  1079  1081,  1105 
forward  edge,  609 
forward  substitution,  816  817 
Fourier  transform,  see  discrete  Fourier 
transform,  fast  Fourier  transform 
fractional  knapsack  problem,  426,  428  ex. 
free  agent,  41 1  pr. 
freeing  of  objects,  243  244 
free  list,  243 
Free  Object,  244 
free  tree,  1 172  1176 
frequency  domain,  898 
full  binary  tree,  1178,  11 80  ex. 

relation  to  optimal  code,  430 
full  node,  489 
full  rank,  1223 


full  walk  of  a  tree,  1114 
fully  parenthesized  matrix  chain  product,  370 
fully  polynomial  time  approximation  scheme, 
1107 

for  subset  sum,  1 128  1134,1139 
function,  1 166  1168 
Ackermann’s,  585 
basis,  835 
convex,  1199 
final  state,  996 
hash,  see  hash  function 
linear,  26,  845 
objective,  664,  847,  851 
potential,  459 
prefix,  1003  1004 
quadratic,  27 
reduction,  1067 
suffix,  996 

transition,  995,  1001  1002,  1012  ex. 
functional  iteration,  58 
fundamental  theorem  of  linear  programming, 
892 

furthest  in  future  strategy,  449  pr. 
fusion  tree,  212,  483 
fuzzy  sorting,  189pr. 

Gabow’s  scaling  algorithm  for  single  source 
shortest  paths,  679  pr. 
gap  character,  989  ex.,  1002  ex. 
gap  heuristic,  760  ex.,  766 
garbage  collection,  151,  243 
gate,  1070 

Gaussian  elimination,  819,  842 
gcd,  see  greatest  common  divisor 
general  number  field  sieve,  984 
generating  function,  108  pr. 
generator 

of  a  subgroup,  944 
of  Z*,  955 
Generic  MST,626 
Generic  Push  Relabel,  741 
generic  push  relabel  algorithm,  740  748 
geometric  distribution,  1202  1203 
and  balls  and  bins,  134 
geometric  series,  1147 
geometry,  computational,  1014  1047 
GF( 2),  1227  pr. 
gift  wrapping,  1037,  1047 
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global  variable,  21 

Goldberg’s  algorithm,  see  push  relabel 
algorithm 

golden  ratio  (<p),  59,  108  pr. 
gossiping,  478 
Graft,  583  pr. 

Graham’s  scan,  1030  1036,  1047 
Graham  Scan,  1031 
graph,  1168  1173 

adjacency  list  representation  of,  590 

adjacency  matrix  representation  of,  591 

algorithms  for,  587  766 

and  asymptotic  notation,  588 

attributes  of,  588,  592 

breadth  first  search  of,  594  602,  623 

coloring  of,  1 103  pr. 

complement  of,  1090 

component,  617 

constraint,  666  668 

dense,  589 

depth  first  search  of,  603  612,  623 

dynamic,  562  n. 

e  dense,  706  pr. 

hamiltonian,  1061 

incidence  matrix  of,  448  pr.,  593  ex. 

interval,  422  ex. 

nonhamiltonian,  1061 

shortest  path  in,  597 

singly  connected,  612  ex. 

sparse,  589 

static,  562  n. 

subproblem,  367  368 

tour  of,  1096 

weighted,  591 

see  also  directed  acyclic  graph,  directed 
graph,  flow  network,  undirected  graph, 
tree 

graphic  matroid,  437  438,  642 
GRAPH  ISOMORPHISM,  1065  ex. 
gray  vertex,  594,  603 
greatest  common  divisor  (gcd),  929  930, 

933  ex. 

binary  gcd  algorithm  for,  981  pr. 

Euclid’s  algorithm  for,  933  939,  981  pr.,  983 
with  more  than  two  arguments,  939  ex. 
recursion  theorem  for,  934 
greedoid,  450 
Greedy,  440 


Greedy  Activity  Selector,  421 
greedy  algorithm,  414  450 
for  activity  selection,  415  422 
for  coin  changing,  446  pr. 
compared  with  dynamic  programming,  381, 
390  ex.,  418,423  427 
Dijkstra’s  algorithm,  658  664 
elements  of,  423  428 
for  fractional  knapsack  problem,  426 
greedy  choice  property  in,  424  425 
for  Huffman  code,  428  437 
Kruskal’s  algorithm,  63 1  633 
and  matroids,  437  443 
for  minimum  spanning  tree,  631  638 
for  multithreaded  scheduling,  78 1  783 
for  off  line  caching,  449  pr. 
optimal  substructure  in,  425 
Prim’s  algorithm,  634  636 
for  set  cover,  1117  1122,1139 
for  task  scheduling,  443  446,  447  448  pr. 
on  a  weighted  matroid,  439  442 
for  weighted  set  cover,  1135  pr. 
greedy  choice  property,  424  425 
of  activity  selection,  417  418 
of  Huffman  codes,  433  434 
of  a  weighted  matroid,  441 
greedy  scheduler,  782 
Greedy  Set  Cover,  1119 
grid,  760  pr. 
group,  939  946 
cyclic,  955 
operator  (©),  939 

guessing  the  solution,  in  the  substitution 
method,  84  85 

half  3  CNF  satisfiability,  1 101  ex. 
half  open  interval,  348 
Hall’s  theorem,  735  ex. 
halting  problem,  1048 
halving  lemma,  908 
HAM  CYCLE,  1062 

hamiltonian  cycle,  1049,  1061,  1091  1096, 
1105 

hamiltonian  graph,  1061 
hamiltonian  path,  1066  ex.,  1101  ex. 

HAM  PATH,  1066  ex. 
handle,  163,  507 
handshaking  lemma,  1172  ex. 
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harmonic  number,  1 147,  1153  1154 
harmonic  series,  1147,  1153  1154 
Hash  Delete,  277  ex. 
hash  function,  256,  262  269 
auxiliary,  272 
collision  resistant,  964 
division  method  for,  263,  268  269  ex. 
e  universal,  269  ex. 
multiplication  method  for,  263  264 
universal,  265  268 
hashing,  253  285 

with  chaining,  257  260,  283  pr. 
double,  272  274,  277  ex. 
k  universal,  284  pr. 
in  memoization,  365,  387 
with  open  addressing,  269  277 
perfect,  277  282,  285 
to  replace  adjacency  lists,  593  ex. 
universal,  265  268 
Hash  Insert,  270, 277  ex. 

Hash  Search,  271,  277  ex. 
hash  table,  256  261 
dynamic,  47 1  ex. 
secondary,  278 
see  also  hashing 
hash  value,  256 
hat  check  problem,  122  ex. 
head 

in  a  disk  drive,  485 
of  a  linked  list,  236 
of  a  queue,  234 
heap, 151  169 

analyzed  by  potential  method,  462  ex. 

binomial,  527  pr. 

building,  156  159,  166  pr. 

compared  with  Fibonacci  heaps,  506  507 

d  ary,  167  pr.,  706  pr. 

deletion  from,  166  ex. 

in  Dijkstra’s  algorithm,  662 

extracting  the  maximum  key  from,  163 

Fibonacci,  see  Fibonacci  heap 

as  garbage  collected  storage,  151 

height  of,  153 

in  Huffman’s  algorithm,  433 
to  implement  a  mergeable  heap,  506 
increasing  a  key  in,  163  164 
insertion  into,  164 
in  Johnson’s  algorithm,  704 


max  heap,  152 
maximum  key  of,  163 
mergeable,  see  mergeable  heap 
min  heap,  153 
in  Prim’s  algorithm,  636 
as  a  priority  queue,  162  166 
relaxed,  530 

running  times  of  operations  on,  506  fig. 
and  treaps,  333  pr. 

2  3  4,  529  pr. 

Heap  Decrease  Key,  165  ex. 

Heap  Delete,  166 ex. 

Heap  Extract  Max,  163 
Heap  Extract  Min,  165  ex. 

Heap  Increase  Key,  164 
Heap  Maximum,  163 
Heap  Minimum,  165  ex. 
heap  property,  152 

maintenance  of,  154  156 
vs.  binary  search  tree  property,  289  ex. 
heapsort,  151  169 
HEAPSORT,  160 
heel,  602  ex. 
height 

of  a  binomial  tree,  527  pr. 
black  ,  309 
of  a  B  tree,  489  490 
of  a  d  ary  heap,  167  pr. 
of  a  decision  tree,  193 
exponential,  300 
of  a  heap,  153 

of  a  node  in  a  heap,  153,  159  ex. 
of  a  node  in  a  tree,  1177 
of  a  red  black  tree,  309 
of  a  tree,  1177 
height  balanced  tree,  333  pr. 
height  function,  in  push  relabel  algorithms,  738 
hereditary  family  of  subsets,  437 
Hermitian  matrix,  832  ex. 
high  endpoint  of  an  interval,  348 
high  function,  537,  546 
Hire  Assistant,  115 
hiring  problem,  114  115,123  124,145 
on  line,  139  141 
probabilistic  analysis  of,  120  121 
hit 

cache,  449  pr. 
spurious,  991 
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Hoare  Partition,  185  pr. 

Hopcroft  Karp,  764 pr. 

Hopcroft  Karp  bipartite  matching  algorithm, 
763  pr. 

horizontal  ray,  1021  ex. 

Horner’s  rule,  41  pr.,  900 

in  the  Rabin  Karp  algorithm,  990 
Huffman,  431 
Huffman  code,  428  437,450 
hull,  convex,  8,  1029  1039,  1046  pr. 

Human  Genome  Project,  6 
hyperedge,  1172 
hypergraph,  1172 

and  bipartite  graphs,  1 173  ex. 

ideal  parallel  computer,  779 
idempotency  laws  for  sets,  1 159 
identity,  939 
identity  matrix,  1218 
if,  in  pseudocode,  20 
image,  1167 

image  compression,  409  pr.,  413 
inadmissible  edge,  749 
incidence,  1169 
incidence  matrix 

and  difference  constraints,  666 
of  a  directed  graph,  448  pr.,  593  ex. 
of  an  undirected  graph,  448  pr. 
inclusion  and  exclusion,  1163  ex. 
incomplete  step,  782 
Increase  Key,  162 
increasing  a  key,  in  a  max  heap,  163  164 
Increment,  454 
incremental  design  method,  29 
for  finding  the  convex  hull,  1030 
in  degree,  1169 
indentation  in  pseudocode,  20 
independence 

of  events,  1192  1193,  1195  ex. 
of  random  variables,  1197 
of  subproblems  in  dynamic  programming, 
383  384 

independent  family  of  subsets,  437 
independent  set,  llOlpr. 

of  tasks,  444 
independent  strands,  789 
index  function,  537,  546 
index  of  an  element  of  Z* ,  955 


indicator  random  variable  ,118  121 

in  analysis  of  expected  height  of  a  randomly 
built  binary  search  tree,  300  303 
in  analysis  of  inserting  into  a  treap,  333  pr. 
in  analysis  of  streaks,  138  139 
in  analysis  of  the  birthday  paradox,  132  133 
in  approximation  algorithm  for 
MAX  3  CNF  satisfiability,  1124 
in  bounding  the  right  tail  of  the  binomial 
distribution,  1212  1213 
in  bucket  sort  analysis,  202  204 
expected  value  of,  118 
in  hashing  analysis,  259  260 
in  hiring  problem  analysis,  120  121 
and  linearity  of  expectation,  119 
in  quicksort  analysis,  182  184,  187pr. 
in  randomized  selection  analysis,  217  219, 
226  pr. 

in  universal  hashing  analysis,  265  266 
induced  subgraph,  1171 
inequality  constraint,  852 
and  equality  constraints,  853 
inequality,  linear,  846 
infeasible  linear  program,  851 
infeasible  solution,  851 
infinite  sequence,  1166 
infinite  set,  1161 
infinite  sum,  1145 
infinity,  arithmetic  with,  650 
Initialize  Preflow,  740 
Initialize  Simplex,  871,  887 
Initialize  Single  Source, 648 
initial  strand,  779 
injective  function,  1167 
inner  product,  1222 
inorder  tree  walk,  287,  293  ex.,  342 
Inorder  Tree  Walk,  288 
in  place  sorting,  17,  148,  206  pr. 
input 

to  an  algorithm,  5 
to  a  combinational  circuit,  1071 
distribution  of,  116,  122 
to  a  logic  gate,  1070 
size  of,  25 
input  alphabet,  995 
INSERT,  162,  230, 463  ex.,  505 
insertion 

into  binary  search  trees,  294  295 
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into  a  bit  vector  with  a  superimposed  binary 
tree,  534 

into  a  bit  vector  with  a  superimposed  tree  of 
constant  height,  534 
into  B  trees,  493  497 
into  chained  hash  tables,  258 
into  d  ary  heaps,  167  pr. 
into  direct  address  tables,  254 
into  dynamic  tables,  464  467 
elementary,  465 
into  Fibonacci  heaps,  510  511 
into  heaps,  164 
into  interval  trees,  349 
into  linked  lists,  237  238 
into  open  address  hash  tables,  270 
into  order  statistic  trees,  343 
into  proto  van  Enide  Boas  structures,  544 
into  queues,  234 
into  red  black  trees,  3 15  323 
into  stacks,  232 
into  sweep  line  statuses,  1024 
into  treaps,  333  pr. 
into  2  3  4  heaps,  529  pr. 
into  van  Emde  Boas  trees,  552  554 
into  Young  tableaus,  167  pr. 
insertion  sort,  12,  16  20,  25  27 
in  bucket  sort,  20 1  204 
compared  with  merge  sort,  14  ex. 
compared  with  quicksort,  178  ex. 
decision  tree  for,  192  fig. 
in  merge  sort,  39  pr. 
in  quicksort,  185  ex. 
using  binary  search,  39  ex. 

Insertion  Sort,  18, 26, 208 pr. 
instance 

of  an  abstract  problem,  1051,  1054 
of  a  problem,  5 

instructions  of  the  RAM  model,  23 
integer  data  type,  23 

integer  linear  programming,  850,  895  pr., 

1101  ex. 

integers  (Z),  1158 

integer  valued  flow,  733 

integrality  theorem,  734 

integral,  to  approximate  summations, 

1154  1156 

integration  of  a  series,  1 147 
interior  of  a  polygon,  1020  ex. 


interior  point  method,  850,  897 
intermediate  vertex,  693 
internal  node,  1176 
internal  path  length,  1180  ex. 
interpolation  by  a  cubic  spline,  840  pr. 
interpolation  by  a  polynomial,  901, 906  ex. 

at  complex  roots  of  unity,  912  913 
intersection 

of  chords,  345  ex. 

determining,  for  a  set  of  line  segments, 
1021  1029, 1047 

determining,  for  two  line  segments, 

1017  1019 
of  languages,  1058 
of  sets  (fl),  1159 
interval,  348 

fuzzy  sorting  of,  1 89  pr. 

Interval  Delete,  349 
interval  graph,  422  ex. 

Interval  Insert,  349 
Interval  Search,  349,  351 
Interval  Search  Exactly,  354 ex. 
interval  tree,  348  354 
interval  trichotomy,  348 
intractability,  1048 
invalid  shift,  985 
inventory  planning,  41 1  pr. 
inverse 

of  a  bijective  function,  1167 
in  a  group,  940 

of  a  matrix,  827  831,  842,  1223,  1225  ex. 
multiplicative,  modulo  «,  949 
inversion 

in  a  self  organizing  list,  476  pr. 
in  a  sequence,  41  pr.,  122  ex.,  345  ex. 
inverter,  1070 
invertible  matrix,  1223 
isolated  vertex,  1169 
isomorphic  graphs,  1171 
iterated  function,  63  pr. 
iterated  logarithm  function,  58  59 
Iterative  FFT.917 
Iterative  Tree  Search,  291 
iter  function,  577 

Jarvis’s  march,  1037  1038,  1047 
Jensen’s  inequality,  1199 
Johnson,  704 
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Johnson’s  algorithm,  700  706 
joining 

of  red  black  trees,  332  pr. 
of  2  3  4  trees,  503  pr. 
joint  probability  density  function,  1 197 
Josephus  permutation,  355  pr. 

Karmarkar’s  algorithm,  897 
Karp’s  minimum  mean  weight  cycle  algorithm, 
680  pr. 

k  ary  tree,  1 179 
k  CNF,  1049 

k  coloring,  1103pr.,  1180pr. 
k  combination,  1185 
k  conjunctive  normal  form,  1049 
kernel  of  a  polygon,  1038  ex. 
key,  16,  147,  162,  229 
dummy,  397 

interpreted  as  a  natural  number,  263 
median,  of  a  B  tree  node,  493 
public,  959,  962 
secret,  959,  962 
static,  277 

keywords,  in  pseudocode,  20  22 

multithreaded,  774,  776  777,  785  786 
“killer  adversary”  for  quicksort,  190 
Kirchhoff’s  current  law,  708 
Kleene  star  (*),  1058 
KMP  algorithm,  1002  1013 
KMP  Matcher,  1005 
knapsack  problem 

fractional,  426,  428  ex. 

0  1,425,  427 ex.,  1137pr„  1139 
k  neighbor  tree,  338 
knot,  of  a  spline,  840  pr. 

Knuth  Morris  Pratt  algorithm,  1002  1013 
k  permutation,  126,  1184 
Kraft  inequality,  1180  ex. 

Rruskal’s  algorithm,  63 1  633,642 
with  integer  edge  weights,  637  ex. 
k  sorted,  207  pr. 
k  string,  1 184 
k  subset,  1161 
k  substring,  1184 
fcth  power,  933  ex. 
k  universal  hashing,  284  pr. 

Lagrange’s  formula,  902 


Lagrange’s  theorem,  944 
Lame’s  theorem,  936 
language,  1057 

completeness  of,  1077  ex. 
proving  NP  completeness  of,  1078  1079 
verification  of,  1063 
last  in,  first  out,  232 
see  also  stack 
late  task,  444 
layers 

convex,  1044pr. 
maximal,  1045  pr. 

LCA,  584 pr. 

lcm  (least  common  multiple),  939  ex. 
LCS,7,  390  397,413 
LCS  Length,  394 
leading  submatrix,  833,  839  ex. 
leaf,  1176 

least  common  ancestor,  584  pr. 

least  common  multiple,  939  ex. 

least  squares  approximation,  835  839 

leaving  a  vertex,  1169 

leaving  variable,  867 

Left,  152 

left  child,  1178 

left  child,  right  sibling  representation,  246, 
249  ex. 

Left  Rotate,  313,  353  ex. 

left  rotation,  312 

left  spine,  333  pr. 

left  subtree,  1178 

Legendre  symbol  (^),  982pr. 

length 

of  a  path,  1170 
of  a  sequence,  1166 
of  a  spine,  333  pr. 
of  a  string,  986,  1184 
level 

of  a  function,  573 
of  a  tree,  1 177 
level  function,  576 
lexicographically  less  than,  304  pr. 
lexicographic  sorting,  304  pr. 
lg  (binary  logarithm),  56 
lg*  (iterated  logarithm  function),  58  59 
lg^  (exponentiation  of  logarithms),  56 
lg  lg  (composition  of  logarithms),  56 
LIFO  (last  in,  first  out),  232 
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see  also  stack 
light  edge,  626 
linear  constraint,  846 
linear  dependence,  1223 
linear  equality,  845 
linear  equations 

solving  modular,  946  950 
solving  systems  of,  813  827 
solving  tridiagonal  systems  of,  840  pr. 
linear  function,  26,  845 
linear  independence,  1223 
linear  inequality,  846 

linear  inequality  feasibility  problem,  894  pr. 
linearity  of  expectation,  1 198 

and  indicator  random  variables,  119 
linearity  of  summations,  1 146 
linear  order,  1165 
linear  permutation,  1229pr. 
linear  probing,  272 
linear  programming,  7,  843  897 
algorithms  for,  850 
applications  of,  849 
duality  in,  879  886 
ellipsoid  algorithm  for,  850,  897 
finding  an  initial  solution  in,  886  891 
fundamental  theorem  of,  892 
interior  point  methods  for,  850,  897 
Karmarkar’s  algorithm  for,  897 
and  maximum  flow,  860  861 
and  minimum  cost  circulation,  896  pr. 
and  minimum  cost  flow,  861  862 
and  minimum  cost  multicommodity  flow, 
864  ex. 

and  multicommodity  flow,  862  863 
simplex  algorithm  for,  864  879,  896 
and  single  pair  shortest  path,  859  860 
and  single  source  shortest  paths,  664  670, 
863  ex. 

slack  form  for,  854  857 
standard  form  for,  850  854 
see  also  integer  linear  programming,  0  1 
integer  programming 
linear  programming  relaxation,  1125 
linear  search,  22  ex. 
linear  speedup,  780 
line  segment,  1015 
comparable,  1022 
determining  turn  of,  1017 


determining  whether  any  intersect, 

1021  1029,  1047 

determining  whether  two  intersect, 

1017  1019 

link 

of  binomial  trees,  527  pr. 
of  Fibonacci  heap  roots,  513 
of  trees  in  a  disjoint  set  forest,  570  571 
Link,  571 
linked  list,  236  241 

compact,  245 ex.,  250 pr. 
deletion  from,  238 
to  implement  disjoint  sets,  564  568 
insertion  into,  237  238 
neighbor  list,  750 
searching,  237,  268  ex. 
self  organizing,  476  pr. 
list,  see  linked  list 
List  Delete,  238 
List  Delete',  238 
List  Insert,  238 
List  Insert',  240 
List  Search,  237 
List  Search',  239 
literal,  1082 

little  oh  notation,  50  51,  64 
little  omega  notation,  5 1 
Lm  distance,  1044  ex. 

In  (natural  logarithm),  56 
load  factor 

of  a  dynamic  table,  463 
of  a  hash  table,  258 
load  instruction,  23 
local  variable,  2 1 
logarithm  function  (log),  56  57 
discrete,  955 
iterated  (lg*),  58  59 
logical  parallelism,  777 
logic  gate,  1070 

longest  common  subsequence,  7,  390  397,  413 
longest  palindrome  subsequence,  405  pr. 
LONGEST  PATH,  1060  ex. 

LONGEST  PATH  LENGTH,  1060  ex. 
longest  simple  cycle,  1101  ex. 
longest  simple  path,  1048 
in  an  unweighted  graph,  382 
in  a  weighted  directed  acyclic  graph,  404  pr. 
Lookup  Chain,  388 
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loop,  in  pseudocode,  20 
parallel,  785  787 
loop  invariant,  1 8  19 

for  breadth  first  search,  595 
for  building  a  heap,  157 
for  consolidating  the  root  list  of  a  Fibonacci 
heap,  517 

for  determining  the  rank  of  an  element  in  an 
order  statistic  tree,  342 
for  Dijkstra’s  algorithm,  660 
and  for  loops,  19  n. 

for  the  generic  minimum  spanning  tree 
method,  625 

for  the  generic  push  relabel  algorithm,  743 

for  Graham’s  scan,  1034 

forheapsort,  160  ex. 

for  Homer’s  rule,  41  pr. 

for  increasing  a  key  in  a  heap,  166  ex. 

initialization  of,  19 

for  insertion  sort,  1 8 

maintenance  of,  19 

for  merging,  32 

for  modular  exponentiation,  957 

origin  of,  42 

for  partitioning,  171 

for  Prim’s  algorithm,  636 

for  the  Rabin  Karp  algorithm,  993 

for  randomly  permuting  an  array,  127, 

128  ex. 

for  red  black  tree  insertion,  318 
for  the  relabel  to  front  algorithm,  755 
for  searching  an  interval  tree,  352 
for  the  simplex  algorithm,  872 
for  string  matching  automata,  998,  1000 
and  termination,  19 
low  endpoint  of  an  interval,  348 
lower  bounds 

on  approximations,  1140 
asymptotic,  48 
for  average  sorting,  207  pr. 
on  binomial  coefficients,  1186 
for  comparting  water  jugs,  206  pr. 
for  convex  hull,  1038  ex.,  1047 
for  disjoint  set  data  structures,  585 
for  finding  the  minimum,  214 
for  finding  the  predecessor,  560 
for  length  of  an  optimal  traveling  salesman 
tour,  1112  1115 


for  median  finding,  227 

for  merging,  208  pr. 

for  minimum  weight  vertex  cover, 

1124  1126 

for  multithreaded  computations,  780 
and  potential  functions,  478 
for  priority  queue  operations,  53 1 
and  recurrences,  67 

for  simultaneous  minimum  and  maximum, 
215  ex. 

for  size  of  an  optimal  vertex  cover,  1 1 10, 
1135pr. 

for  sorting,  191  194,  205  pr.,  211,  531 
for  streaks,  136  138,  142  ex. 
on  summations,  1152,  1154 
lower  median,  213 
lower  square  root  (t/  ),  546 
lower  triangular  matrix,  1219,  1222ex., 

1225  ex. 

low  function,  537,  546 
LU  decomposition,  806  pr.,  819  822 
LU  Decomposition,  821 
LUP  decomposition,  806  pr.,  815 
computation  of,  822  825 
of  a  diagonal  matrix,  827  ex. 
in  matrix  inversion,  828 
and  matrix  multiplication,  832  ex. 
of  a  permutation  matrix,  827  ex. 
use  of,  815  819 
LUP  Decomposition,  824 
LUP  Solve, 817 

main  memory,  484 
Make  Heap,  505 
Make  Set,  561 

disjoint  set  forest  implementation  of,  57 1 
linked  list  implementation  of,  564 
makespan,  1136pr. 

Make  Tree,  583  pr. 

Manhattan  distance,  225  pr.,  1044 ex. 
marked  node,  508,  519  520 
Markov’s  inequality,  1201  ex. 
master  method  for  solving  a  recurrence,  93  97 
master  theorem,  94 
proof  of,  97  106 
matched  vertex,  732 
matching 

bipartite,  732,  763  pr. 
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maximal,  1110,  1135pr. 
maximum,  1135pr. 
and  maximum  flow,  732  736,  747  ex. 
perfect,  735  ex. 
of  strings,  985  1013 
weighted  bipartite,  530 
matric  matroid,  437 
matrix,  1217  1229 
addition  of,  1220 
adjacency,  591 

conjugate  transpose  of,  832  ex. 

determinant  of,  1224  1225 

diagonal,  1218 

Hermitian,  832  ex. 

identity,  1218 

incidence,  448  pr.,  593  ex. 

inversion  of,  806 pr.,  827  831,842 

lower  triangular,  1219,  1222  ex.,  1225  ex. 

multiplication  of,  see  matrix  multiplication 

negative  of,  1220 

permutation,  1220,  1222  ex. 

predecessor,  685 

product  of,  with  a  vector,  785  787,  789  790, 
792  ex. 

pseudoinverse  of,  837 
scalar  multiple  of,  1220 
subtraction  of,  1221 
symmetric,  1220 

symmetric  positive  definite,  832  835,  842 
Toeplitz,  921  pr. 
transpose  of,  797  ex.,  1217 
transpose  of,  multithreaded,  792  ex. 
tridiagonal,  1219 
unit  lower  triangular,  1219 
unit  upper  triangular,  1219 
upper  triangular,  1219,  1225  ex. 
Vandermonde,  902,  1226  pr. 
matrix  chain  multiplication,  370  378 
Matrix  Chain  Multiply 
Matrix  Chain  Order,  375 
matrix  multiplication,  75  83,1221 
for  all  pairs  shortest  paths,  686  693, 

706  707 
boolean,  832  ex. 

and  computing  the  determinant,  832  ex. 
divide  and  conquer  method  for,  76  83 
and  LUP  decomposition,  832  ex. 
and  matrix  inversion,  828  831,  842 


multithreaded  algorithm  for,  792  797, 

806  pr. 

Pan’s  method  for,  82  ex. 

Strassen’s  algorithm  for,  79  83,  1 1 1  112 
Matrix  Multiply,  371 
matrix  vector  multiplication,  multithreaded, 
785  787,  792  ex. 
with  race,  789  790 
matroid,  437  443,  448  pr.,  450,  642 
for  task  scheduling,  443  446 
Mat  Vec,  785 
Mat  Vec  Main  Loop,  786 
Mat  Vec  Wrong,  790 
MAX  CNF  satisfiability,  11 27  ex. 

MAX  CUT  problem,  1 127  ex. 

Max  Flow  By  Scaling, 763 pr. 
max  flow  min  cut  theorem,  723 
max  heap,  152 
building,  156  159 
d  ary,  167  pr. 
deletion  from,  166  ex. 
extracting  the  maximum  key  from,  163 
inheapsort,  159  162 
increasing  a  key  in,  163  164 
insertion  into,  164 
maximum  key  of,  163 
as  a  max  priority  queue,  162  166 
mergeable,  250  n.,  481  n.,  505  n. 

Max  Heapify,  154 
Max  Heap  Insert,  164 
building  a  heap  with,  166  pr. 
max  heap  property,  152 
maintenance  of,  154  156 
maximal  element,  of  a  partially  ordered  set, 
1165 

maximal  layers,  1045  pr. 
maximal  matching,  1110,  1135  pr. 
maximal  point,  1045  pr. 
maximal  subset,  in  a  matroid,  438 
maximization  linear  program,  846 

and  minimization  linear  programs,  852 
maximum,  213 

in  binary  search  trees,  291 
of  a  binomial  distribution,  1207  ex. 
in  a  bit  vector  with  a  superimposed  binary 
tree,  533 

in  a  bit  vector  with  a  superimposed  tree  of 
constant  height,  535 
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finding,  214  215 
in  heaps,  163 

in  order  statistic  trees,  347  ex. 
in  proto  van  Emde  Boas  structures,  544  ex. 
in  red  black  trees,  311 
in  van  Emde  Boas  trees,  550 
Maximum,  162  163,  230 
maximum  bipartite  matching,  732  736, 

747  ex.,  766 

Hopcroft  Karp  algorithm  for,  763  pr. 
maximum  degree,  in  a  Fibonacci  heap,  509, 
523  526 

maximum  flow,  708  766 

Edmonds  Karp  algorithm  for,  727  730 
Ford  Fulkerson  method  for,  714  731,  765 
as  a  linear  program,  860  861 
and  maximum  bipartite  matching,  732  736, 
747  ex. 

push  relabel  algorithms  for,  736  760,  765 
relabel  to  front  algorithm  for,  748  760 
scaling  algorithm  for,  762  pr.,  765 
updating,  762  pr. 
maximum  matching,  1135pr. 
maximum  spanning  tree,  1137pr. 
maximum  subarray  problem,  68  75,  111 
max  priority  queue,  162 
MAX  3  CNF  satisfiability,  1123  1124,  1139 
Maybe  MST  A,  641  pr. 

Maybe  MSTB,641pr. 

Maybe  MST  C,641pr. 
mean,  see  expected  value 
mean  weight  of  a  cycle,  680  pr. 
median,  213  227 

multithreaded  algorithm  for,  805  ex. 
of  sorted  lists,  223  ex. 
of  two  sorted  lists,  804  ex. 
weighted,  225  pr. 
median  key,  of  a  B  tree  node,  493 
median  of  3  method,  188pr. 
member  of  a  set  (e),  1158 
membership 

in  proto  van  Emde  Boas  structures,  540  541 
in  Van  Emde  Boas  trees,  550 
memoization,  365,  387  389 
Memoized  Cut  Rod,  365 
Memoized  Cut  Rod  Aux,  366 
Memoized  Matrix  Chain,  388 
memory,  484 


memory  hierarchy,  24 
Merge,  31 

mergeable  heap,  481,  505 
binomial  heaps,  527  pr. 
linked  list  implementation  of,  250  pr. 
relaxed  heaps,  530 

running  times  of  operations  on,  506  fig. 

2  3  4  heaps,  529  pr. 
see  also  Fibonacci  heap 
mergeable  max  heap,  250  n.,  48 1  n.,  505  n. 
mergeable  min  heap,  250  n.,  481  n.,  505 
Merge  Lists,  1129 
merge  sort,  12,  30  37 

compared  with  insertion  sort,  14  ex. 
multithreaded  algorithm  for,  797  805,  812 
use  of  insertion  sort  in,  39  pr. 

Merge  Sort,  34 
Merge  Sort',  797 
merging 

of  k  sorted  lists,  166  ex. 
lower  bounds  for,  208  pr. 
multithreaded  algorithm  for,  798  801 
of  two  sorted  arrays,  30 
Miller  Rabin,  970 
Miller  Rabin  primality  test,  968  975,  983 
Min  Gap,  354  ex. 
min  heap,  153 

analyzed  by  potential  method,  462  ex. 
building,  156  159 
d  ary,  706  pr. 

in  Dijkstra’s  algorithm,  662 
in  Huffman’s  algorithm,  433 
in  Johnson’s  algorithm,  704 
mergeable,  250  n.,  481  n.,  505 
as  a  min  priority  queue,  165  ex. 
in  Prim’s  algorithm,  636 
Min  Heapify,  156  ex. 

Min  Heap  Insert,  165  ex. 
min  heap  ordering,  507 
min  heap  property,  153,  507 
maintenance  of,  156  ex. 
in  treaps,  333  pr. 

vs.  binary  search  tree  property,  289  ex. 
minimization  linear  program,  846 

and  maximization  linear  programs,  852 
minimum,  213 

in  binary  search  trees,  291 
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in  a  bit  vector  with  a  superimposed  binary 
tree,  533 

in  a  bit  vector  with  a  superimposed  tree  of 
constant  height,  535 
in  B  trees,  497  ex. 
in  Fibonacci  heaps,  511 
finding,  214  215 
off  line,  582  pr. 
in  order  statistic  trees,  347  ex. 
in  proto  van  Ernde  Boas  structures,  541  542 
in  red  black  trees,  311 
in  2  3  4  heaps,  529  pr. 
in  van  Emde  Boas  trees,  550 
Minimum,  162, 214,  230, 505 
minimum  cost  circulation,  896  pr. 
minimum  cost  flow,  86 1  862 
minimum  cost  multicommodity  flow,  864  ex. 
minimum  cost  spanning  tree,  see  minimum 
spanning  tree 
minimum  cut,  721,  731  ex. 
minimum  degree,  of  a  B  tree,  489 
minimum  mean  weight  cycle,  680  pr. 
minimum  node,  of  a  Fibonacci  heap,  508 
minimum  path  cover,  761  pr. 
minimum  spanning  tree,  624  642 
in  approximation  algorithm  for 
traveling  salesman  problem,  1112 
Boruvka’s  algorithm  for,  641 
on  dynamic  graphs,  637  ex. 
generic  method  for,  625  630 
Kruskal’s  algorithm  for,  63 1  633 
Prim’s  algorithm  for,  634  636 
relation  to  matroids,  437,  439  440 
second  best,  638  pr. 

minimum  weight  spanning  tree,  see  minimum 
spanning  tree 

minimum  weight  vertex  cover,  1 124  1127, 
1139 

minor  of  a  matrix,  1224 
min  priority  queue,  162 

in  constructing  Fluffman  codes,  43 1 
in  Dijkstra’s  algorithm,  661 
in  Prim’s  algorithm,  634,  636 
miss,  449  pr. 
missing  child,  1178 
mod,  54,  928 
modifying  operation,  230 
modular  arithmetic,  54,  923  pr.,  939  946 


modular  equivalence,  54,  1165  ex. 
modular  exponentiation,  956 
Modular  Exponentiation,  957 
modular  linear  equations,  946  950 
Modular  Linear  Equation  Solver, 
949 

modulo,  54,  928 
Monge  array,  1 10  pr. 
monotone  sequence,  168 
monotonically  decreasing,  53 
monotonically  increasing,  53 
Monty  Flail  problem,  1195  ex. 
move  to  front  heuristic,  476 pr.,  478 
MST  KRUSKAL,631 
MST  Prim,  634 
MST  Reduce,  639 pr. 
much  greater  than  (5S>),  574 
much  less  than  (<5C),  783 
multicommodity  flow,  862  863 
minimum  cost,  864  ex. 
multicore  computer,  772 
multidimensional  fast  Fourier  transform, 

921  pr. 

multigraph,  1172 

converting  to  equivalent  undirected  graph, 
593  ex. 
multiple,  927 

of  an  element  modulo  «,  946  950 
least  common,  939  ex. 
scalar,  1220 
multiple  assignment,  21 
multiple  sources  and  sinks,  712 
multiplication 

of  complex  numbers,  83  ex. 
divide  and  conquer  method  for,  920  pr. 
of  matrices,  see  matrix  multiplication 
of  a  matrix  chain,  370  378 
matrix  vector,  multithreaded,  785  787, 
789  790,  792  ex. 
modulo  n  (•„),  940 
of  polynomials,  899 
multiplication  method,  263  264 
multiplicative  group  modulo  n,  941 
multiplicative  inverse,  modulo  n,  949 
multiply  instruction,  23 
Multipop,  453 
multiprocessor,  772 
Multipush,  456ex. 
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multiset,  1 158  n. 

multithreaded  algorithm,  10,772  812 

for  computing  Fibonacci  numbers,  774  780 
for  fast  Fourier  transform,  804  ex. 

Floyd  Warshall  algorithm,  797  ex. 

for  LU  decomposition,  806  pr. 

for  LUP  decomposition,  806  pr. 

for  matrix  inversion,  806  pr. 

for  matrix  multiplication,  792  797,  806  pr. 

for  matrix  transpose,  792  ex.,  797  ex. 

for  matrix  vector  product,  785  787, 

789  790,  792  ex. 
for  median,  805  ex. 
for  merge  sorting,  797  805,  812 
for  merging,  798  801 
for  order  statistics,  805  ex. 
for  partitioning,  804  ex. 
for  prefix  computation,  807  pr. 
for  quicksort,  8 1 1  pr. 
for  reduction,  807  pr. 
for  a  simple  stencil  calculation,  809  pr. 
for  solving  systems  of  linear  equations, 

806  pr. 

Strassen’s  algorithm,  795  796 
multithreaded  composition,  784  fig. 
multithreaded  computation,  777 
multithreaded  scheduling,  781  783 
mutually  exclusive  events,  1190 
mutually  independent  events,  1193 

N  (set  of  natural  numbers),  1158 
naive  algorithm,  for  string  matching,  988  990 
Naive  String  Matcher,  988 
natural  cubic  spline,  840  pr. 
natural  numbers  (N),  1158 
keys  interpreted  as,  263 
negative  of  a  matrix,  1220 
negative  weight  cycle 

and  difference  constraints,  667 
and  relaxation,  677  ex. 
and  shortest  paths,  645,  653  654,  692  ex., 
700  ex. 

negative  weight  edges,  645  646 
neighbor,  1172 
neighborhood,  735  ex. 
neighbor  list,  750 
nested  parallelism,  776,  805  pr. 
nesting  boxes,  678  pr. 


net  flow  across  a  cut,  720 
network 

admissible,  749  750 
flow,  see  flow  network 
residual,  715  719 
for  sorting,  811 
Next  To  Top,  1031 
nil,  21 
node,  1176 
see  also  vertex 
nonbasic  variable,  855 

nondeterministic  multithreaded  algorithm,  787 
nondeterministic  polynomial  time,  1064 n. 
see  also  NP 

nonhamiltonian  graph,  1061 

noninstance,  1056  n. 

noninvertible  matrix,  1223 

nonnegativity  constraint,  851,  853 

nonoverlappable  string  pattern,  1002  ex. 

nonsaturating  push,  739,  745 

nonsingular  matrix,  1223 

nontrivial  power,  933  ex. 

nontrivial  square  root  of  1 ,  modulo  n ,  956 

no  path  property,  650,  672 

normal  equation,  837 

norm  of  a  vector,  1222 

NOT  function  (-.),  1071 

not  a  set  member  (^),  1158 

not  equivalent  (^),  54 

NOT  gate,  1070 

NP  (complexity  class),  1049,  1064,  1066 ex., 
1105 

NPC  (complexity  class),  1050,  1069 
NP  complete,  1050,  1069 
NP  completeness,  9  10,  1048  1 105 
of  the  circuit  satisfiability  problem, 

1070  1077 

of  the  clique  problem,  1086  1089,  1 105 
of  determining  whether  a  boolean  formula  is 
a  tautology,  1086  ex. 
of  the  formula  satisfiability  problem, 

1079  1081,  1105 

of  the  graph  coloring  problem,  1 103  pr. 
of  the  half  3  CNF  satisfiability  problem, 
1101  ex. 

of  the  hamiltonian  cycle  problem, 

1091  1096,1105 

of  the  hamiltonian  path  problem,  1101  ex. 
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of  the  independent  set  problem,  1 101  pr. 
of  integer  linear  programming,  1101  ex. 
of  the  longest  simple  cycle  problem, 

1101  ex. 

proving,  of  a  language,  1078  1079 
of  scheduling  with  profits  and  deadlines, 

1 104  pr. 

of  the  set  covering  problem,  1122  ex. 
of  the  set  partition  problem,  1 101  ex. 
of  the  subgraph  isomorphism  problem, 

1100  ex. 

of  the  subset  sum  problem,  1097  1 100 
of  the  3  CNF  satisfiability  problem, 

1082  1085,1105 

of  the  traveling  salesman  problem, 

1096  1097 

of  the  vertex  cover  problem,  1089  1091, 

1105 

of  0  1  integer  programming,  1 100  ex. 

NP  hard,  1069 

n  set,  1161 

n  tuple,  1162 

null  event,  1190 

null  tree,  1178 

null  vector,  1224 

number  field  sieve,  984 

numerical  stability,  813,  815,  842 

n  vector,  1218 

o  notation,  50  5 1 ,  64 
O  notation,  45  fig.,  47  48,64 
O'  notation,  62  pr. 

O  notation,  62  pr. 
object,  21 

allocation  and  freeing  of,  243  244 
array  implementation  of,  241  246 
passing  as  parameter,  2 1 
objective  function,  664,  847,  851 
objective  value,  847,  851 
oblivious  compare  exchange  algorithm,  208  pr. 
occurrence  of  a  pattern,  985 
Off  Line  Minimum,  583  pr. 
off  line  problem 
caching,  449  pr. 

least  common  ancestors,  584  pr. 
minimum,  582  pr. 

Omega  notation,  45  fig.,  48  49,  64 
1  approximation  algorithm,  1107 


one  pass  method,  585 

one  to  one  correspondence,  1167 

one  to  one  function,  1167 

on  line  convex  hull  problem,  1039  ex. 

on  line  hiring  problem,  139  141 

On  Line  Maximum,  140 

on  line  multithreaded  scheduler,  78 1 

On  Segment,  1018 

onto  function,  1167 

open  address  hash  table,  269  277 

with  double  hashing,  272  274,  277  ex. 
with  linear  probing,  272 
with  quadratic  probing,  272,  283  pr. 
open  interval,  348 
OpenMP,  774 

optimal  binary  search  tree,  397  404,  413 

Optimal  BST,402 

optimal  objective  value,  85 1 

optimal  solution,  85 1 

optimal  subset,  of  a  matroid,  439 

optimal  substructure 

of  activity  selection,  416 

of  binary  search  trees,  399  400 

in  dynamic  programming,  379  384 

of  the  fractional  knapsack  problem,  426 

in  greedy  algorithms,  425 

of  Huffman  codes,  435 

of  longest  common  subsequences,  392  393 

of  matrix  chain  multiplication,  373 

of  rod  cutting,  362 

of  shortest  paths,  644  645,  687,  693  694 
of  unweighted  shortest  paths,  382 
of  weighted  matroids,  442 
of  the  0  1  knapsack  problem,  426 
optimal  vertex  cover,  1 108 
optimization  problem,  359,  1050,  1054 
approximation  algorithms  for,  10, 

1106  1140 

and  decision  problems,  1051 
OR  function  (V),  697,  1071 
order 

of  a  group,  945 
linear,  1165 
partial,  1165 
total,  1165 
ordered  pair,  1161 
ordered  tree,  1177 
order  of  growth,  28 
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order  statistics,  213  227 
dynamic,  339  345 
multithreaded  algorithm  for,  805  ex. 
order  statistic  tree,  339  345 
querying,  347  ex. 

OR  gate,  1070 

origin,  1015 

or,  in  pseudocode,  22 

orthonormal,  842 

OS  Key  Rank,  344ex. 

OS  Rank,  342 
OS  Select,  341 
out  degree,  1169 
outer  product,  1222 
output 

of  an  algorithm,  5 

of  a  combinational  circuit,  1071 

of  a  logic  gate,  1070 

overdetermined  system  of  linear  equations,  814 
overflow 

of  a  queue,  235 
of  a  stack,  233 
overflowing  vertex,  736 
discharge  of,  751 
overlapping  intervals,  348 
finding  all,  354  ex. 
point  of  maximum  overlap,  354  pr. 
overlapping  rectangles,  354  ex. 
overlapping  subproblems,  384  386 
overlapping  suffix  lemma,  987 

P  (complexity  class),  1049,  1055,  1059, 

1061  ex.,  1105 

package  wrapping,  1037,  1047 
page  on  a  disk,  486,  499  ex.,  502  pr. 
pair,  ordered,  1161 
pairwise  disjoint  sets,  1161 
pairwise  independence,  1193 
pairwise  relatively  prime,  931 
palindrome,  405  pr. 

Pan’s  method  for  matrix  multiplication,  82  ex. 
parallel  algorithm,  10,  772 

see  also  multithreaded  algorithm 
parallel  computer,  772 
ideal,  779 

parallel  for,  in  pseudocode,  785  786 
parallelism 
logical.  111 


of  a  multithreaded  computation,  780 
nested,  776 

of  a  randomized  multithreaded  algorithm, 
811  pr. 

parallel  loop,  785  787,  805  pr. 
parallel  machine  scheduling  problem,  1136  pr. 
parallel  prefix,  807  pr. 
parallel  random  access  machine,  811 
parallel  slackness,  781 
rule  of  thumb,  783 

parallel,  strands  being  logically  in,  778 
parameter,  21 

costs  of  passing,  107  pr. 
parent 

in  a  breadth  first  tree,  594 
in  a  multithreaded  computation,  776 
in  a  rooted  tree,  1176 
Parent,  152 

parenthesis  structure  of  depth  first  search,  606 

parenthesis  theorem,  606 

parenthesization  of  a  matrix  chain  product,  370 

parse  tree,  1082 

partially  ordered  set,  1165 

partial  order,  1165 

Partition,  171 

Partition',  186pr. 

partition  function,  361  n. 

partitioning,  171  173 

around  median  of  3  elements,  185  ex. 

Hoare’s  method  for,  185  pr. 
multithreaded  algorithm  for,  804  ex. 
randomized,  179 
partition  of  a  set,  1161,  1164 
Pascal’s  triangle,  11 88  ex. 
path,  1170 

augmenting,  719  720,  763  pr. 
critical,  657 
find,  569 

hamiltonian,  1066  ex. 
longest,  382,  1048 
shortest,  see  shortest  paths 
simple,  1170 
weight  of,  643 
PATH,  1051,  1058 
path  compression,  569 
path  cover,  761  pr. 

path  length,  of  a  tree,  304  pr.,  11 80  ex. 
path  relaxation  property,  650,  673 
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pattern,  in  string  matching,  985 
nonoverlappable,  1002  ex. 
pattern  matching,  see  string  matching 
penalty,  444 

perfect  hashing,  277  282,  285 
perfect  linear  speedup,  780 
perfect  matching,  735  ex. 
permutation,  1167 
bit  reversal,  472  pr.,  918 
Josephus,  355  pr. 
k  permutation,  126,  1184 
linear,  1229pr. 
in  place,  126 
random,  124  128 
of  a  set,  1 1 84 
uniform  random,  116,  125 
permutation  matrix,  1220,  1222  ex.,  1226 ex. 

LUP  decomposition  of,  827  ex. 

Permute  By  Cyclic,  129 ex. 

Permute  By  Sorting,  125 
Permute  With  All,  129  ex. 

Permute  Without  Identity,  128  ex. 
persistent  data  structure,  331  pr.,  482 
Persistent  Tree  Insert,  331  pr. 

PERT  chart,  657,  657  ex. 

P  FIB, 776 

phase,  of  the  relabel  to  front  algorithm,  758 
phi  function  (</>(«)),  943 
Pisano  Delete,  526 pr. 
pivot 

in  linear  programming,  867,  869  870, 

878  ex. 

in  LU  decomposition,  821 
in  quicksort,  171 
Pivot,  869 
platter,  485 

P  Matrix  Multiply  Recursive,  794 
P  Merge,  800 
P  Merge  Sort, 803 
pointer,  21 

array  implementation  of,  241  246 
trailing,  295 

point  value  representation,  90 1 
polar  angle,  1020  ex. 

Pollard’s  rho  heuristic,  976  980,  980  ex.,  984 
Pollard  Rho,  976 
polygon,  1020  ex. 
kernel  of,  1038  ex. 


star  shaped,  1038  ex. 
polylogarithmically  bounded,  57 
polynomial,  55,  898 
addition  of,  898 
asymptotic  behavior  of,  61  pr. 
coefficient  representation  of,  900 
derivatives  of,  922  pr. 
evaluation  of,  41  pr.,  900,  905  ex.,  923  pr. 
interpolation  by,  901,  906  ex. 
multiplication  of,  899,  903  905,  920  pr. 
point  value  representation  of,  90 1 
polynomial  growth  condition,  113 
polynomially  bounded,  55 
polynomially  related,  1056 
polynomial  time  acceptance,  1058 
polynomial  time  algorithm,  927,  1048 
polynomial  time  approximation  scheme,  1107 
for  maximum  clique,  1 134pr. 
polynomial  time  computability,  1056 
polynomial  time  decision,  1059 
polynomial  time  reducibility  (<p),  1067, 

1077  ex. 

polynomial  time  solvability,  1055 
polynomial  time  verification,  1061  1066 
POP,  233,  452 

pop  from  a  run  time  stack,  188  pr. 
positional  tree,  1178 
positive  definite  matrix,  1225 
post  office  location  problem,  225  pr. 
postorder  tree  walk,  287 
potential  function,  459 
for  lower  bounds,  478 
potential  method,  459  463 
for  binary  counters,  461  462 
for  disjoint  set  data  structures,  575  581, 
582  ex. 

for  dynamic  tables,  466  47 1 
for  Fibonacci  heaps,  509  512,517  518, 
520  522 

for  the  generic  push  relabel  algorithm,  746 
for  min  heaps,  462  ex. 
for  restructuring  red  black  trees,  474  pr. 
for  self  organizing  lists  with  move  to  front, 
476  pr. 

for  stack  operations,  460  461 
potential,  of  a  data  structure,  459 
power 

of  an  element,  modulo  n,  954  958 
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kth,  933  ex. 
nontrivial,  933  ex. 
power  series,  108  pr. 
power  set,  1161 

Pr{  }  (probability  distribution),  1 190 

PRAM,  811 

predecessor 

in  binary  search  trees,  291  292 
in  a  bit  vector  with  a  superimposed  binary 
tree,  534 

in  a  bit  vector  with  a  superimposed  tree  of 
constant  height,  535 
in  breadth  first  trees,  594 
in  B  trees,  497  ex. 
in  linked  lists,  236 
in  order  statistic  trees,  347  ex. 
in  proto  van  Enide  Boas  structures,  544  ex. 
in  red  black  trees,  311 
in  shortest  paths  trees,  647 
in  Van  Emde  Boas  trees,  551  552 
Predecessor, 230 
predecessor  matrix,  685 
predecessor  subgraph 

in  all  pairs  shortest  paths,  685 
in  breadth  first  search,  600 
in  depth  first  search,  603 
in  single  source  shortest  paths,  647 
predecessor  subgraph  property,  650,  676 
preemption,  447  pr. 
prefix 

of  a  sequence,  392 
of  a  string  (IZ),  986 
prefix  code,  429 
prefix  computation,  807  pr. 
prefix  function,  1003  1004 
prefix  function  iteration  lemma,  1007 
preflow,  736,  765 
preimage  of  a  matrix,  1228  pr. 
preorder,  total,  1165 
preorder  tree  walk,  287 
presorting,  1043 
Prim’s  algorithm,  634  636,  642 
with  an  adjacency  matrix,  637  ex. 
in  approximation  algorithm  for 
traveling  salesman  problem,  1112 
implemented  with  a  Fibonacci  heap,  636 
implemented  with  a  min  heap,  636 
with  integer  edge  weights,  637  ex. 


similarity  to  Dijkstra’s  algorithm,  634,  662 
for  sparse  graphs,  638  pr. 
primality  testing,  965  975,  983 
Miller  Rabin  test,  968  975,  983 
pseudoprimality  testing,  966  968 
primal  linear  program,  880 
primary  clustering,  272 
primary  memory,  484 
prime  distribution  function,  965 
prime  number,  928 
density  of,  965  966 
prime  number  theorem,  965 
primitive  root  of  Z*,  955 
principal  root  of  unity,  907 
principle  of  inclusion  and  exclusion,  1163  ex. 
Print  All  Pairs  Shortest  Path, 685 
Print  Cut  Rod  Solution,  369 
Print  Intersecting  Segments,  1028  ex. 
Print  LCS,395 
Print  Optimal  Parens,  377 
Print  Path,  601 
Print  Set, 572 ex. 
priority  queue,  162  166 

in  constructing  Huffman  codes,  43 1 
in  Dijkstra’s  algorithm,  661 
heap  implementation  of,  162  166 
lower  bounds  for,  531 
max  priority  queue,  162 
min  priority  queue,  162,  165  ex. 
with  monotone  extractions,  168 
in  Prim’s  algorithm,  634,  636 
proto  van  Emde  Boas  structure 
implementation  of,  538  545 
van  Emde  Boas  tree  implementation  of, 

531  560 

see  also  binary  search  tree,  binomial  heap, 
Fibonacci  heap 

probabilistically  checkable  proof,  1105,  1140 
probabilistic  analysis,  1 15  116,130  142 
of  approximation  algorithm  for 
MAX  3  CNF  satisfiability,  1124 
and  average  inputs,  28 
of  average  node  depth  in  a  randomly  built 
binary  search  tree,  304  pr. 
of  balls  and  bins,  133  134 
of  birthday  paradox,  130  133 
of  bucket  sort,  201  204,  204  ex. 
of  collisions,  261  ex.,  282  ex. 
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of  convex  hull  over  a  sparse  hulled 
distribution,  1046  pr. 
of  file  comparison,  995  ex. 
of  fuzzy  sorting  of  intervals,  189pr. 
of  hashing  with  chaining,  258  260 
of  height  of  a  randomly  built  binary  search 
tree,  299  303 

of  hiring  problem,  120  121,139  141 
of  insertion  into  a  binary  search  tree  with 
equal  keys,  303  pr. 

of  longest  probe  bound  for  hashing,  282  pr. 
of  lower  bound  for  sorting,  205  pr. 
of  Miller  Rabin  primality  test,  971  975 
and  multithreaded  algorithms,  81 1  pr. 
of  on  line  hiring  problem,  139  141 
of  open  address  hashing,  274  276,  277  ex. 
of  partitioning,  179ex.,  185ex.,  187  188pr. 
of  perfect  hashing,  279  282 
of  Pollard’s  rho  heuristic,  977  980 
of  probabilistic  counting,  143  pr. 
of  quicksort,  181  184,187  188pr.,  303ex. 
of  Rabin  Karp  algorithm,  994 
and  randomized  algorithms,  123  124 
of  randomized  selection,  217  219,  226  pr. 
of  searching  a  compact  list,  250  pr. 
of  slot  size  bound  for  chaining,  283  pr. 
of  sorting  points  by  distance  from  origin, 
204  ex. 

of  streaks,  135  139 
of  universal  hashing,  265  268 
probabilistic  counting,  143  pr. 
probability,  1189  1196 
probability  density  function,  1196 
probability  distribution,  1 190 
probability  distribution  function,  204  ex. 
probe  sequence,  270 
probing,  270,  282  pr. 

see  also  linear  probing,  quadratic  probing, 
double  hashing 
problem 

abstract,  1054 
computational,  5  6 
concrete,  1055 
decision,  1051,  1054 
intractable,  1048 
optimization,  359,  1050,  1054 
solution  to,  6,  1054  1055 
tractable,  1048 


procedure,  6,  16  17 
product  (]”[)>  H48 
Cartesian,  1162 
cross,  1016 
inner,  1222 

of  matrices,  1221,  1226ex. 
outer,  1222 
of  polynomials,  899 
rule  of,  1184 
scalar  flow,  714  ex. 
professional  wrestler,  602  ex. 
program  counter,  1073 
programming,  see  dynamic  programming, 
linear  programming 
proper  ancestor,  1176 
proper  descendant,  1176 
proper  subgroup,  944 
proper  subset  (C),  1159 
proto  van  Enide  Boas  structure,  538  545 
cluster  in,  538 

compared  with  van  Emde  Boas  trees,  547 
deletion  from,  544 
insertion  into,  544 
maximum  in,  544  ex. 
membership  in,  540  541 
minimum  in,  541  542 
predecessor  in,  544  ex. 
successor  in,  543  544 
summary  in,  540 
Proto  vEB  Insert,  544 
Proto  vEB  Member,  541 
Proto  vEB  Minimum,  542 
proto  vEB  structure,  see  proto  van  Emde  Boas 
structure 

Proto  vEB  Successor,  543 
prune  and  search  method,  1030 
pruning  a  Fibonacci  heap,  529  pr. 

P  Scan  l,808pr. 

P  Scan  2, 808  pr. 

P  Scan  3,809pr. 

P  Scan  DowN,809pr. 

P  Scan  Up,  809 pr. 
pseudocode,  16,  20  22 
pseudoinverse,  837 
pseudoprime,  966  968 
Pseudoprime,  967 
pseudorandom  number  generator,  117 
P  Square  Matrix  Multiply, 793 
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P  Transpose,  792  ex. 

public  key,  959,  962 

public  key  cryptosystem,  958  965,  983 

Push 

push  relabel  operation,  739 
stack  operation,  233,  452 
push  onto  a  run  time  stack,  188pr. 
push  operation  (in  push  relabel  algorithms), 
738  739 

nonsaturating,  739,  745 
saturating,  739,  745 
push  relabel  algorithm,  736  760,  765 
basic  operations  in,  738  740 
by  discharging  an  overflowing  vertex  of 
maximum  height,  760  ex. 
to  find  a  maximum  bipartite  matching, 

747  ex. 

gap  heuristic  for,  760  ex.,  766 

generic  algorithm,  740  748 

with  a  queue  of  overflowing  vertices,  759  ex. 

relabel  to  front  algorithm,  748  760 

quadratic  function,  27 
quadratic  probing,  272,  283  pr. 
quadratic  residue,  982  pr. 
quantile,  223  ex. 
query,  230 
queue,  232,  234  235 

in  breadth  first  search,  595 
implemented  by  stacks,  236  ex. 
linked  list  implementation  of,  240  ex. 
priority,  see  priority  queue 
in  push  relabel  algorithms,  759  ex. 
quicksort,  170  190 
analysis  of,  174  185 
average  case  analysis  of,  1 8 1  184 
compared  with  insertion  sort,  178  ex. 
compared  with  radix  sort,  199 
with  equal  element  values,  186pr. 
good  worst  case  implementation  of,  223  ex. 
“killer  adversary”  for,  190 
with  median  of  3  method,  188pr. 
multithreaded  algorithm  for,  8 1 1  pr. 
randomized  version  of,  179  180,  187  pr. 
stack  depth  of,  188pr. 
tail  recursive  version  of,  188pr. 
use  of  insertion  sort  in,  185  ex. 
worst  case  analysis  of,  180  181 


Quicksort,  171 
Quicksort',  186  pr. 
quotient,  928 

R  (set  of  real  numbers),  1158 

Rabin  Karp  algorithm,  990  995,  1013 

Rabin  Karp  Matcher,  993 

race,  787  790 

Race  Example,  788 

radix  sort,  197  200 

compared  with  quicksort,  199 
Radix  Sort,  198 
radix  tree,  304  pr. 

RAM,  23  24 
Random,  117 

random  access  machine,  23  24 
parallel,  8 1 1 

randomized  algorithm,  116  117,122  130 
and  average  inputs,  28 
comparison  sort,  205  pr. 
for  fuzzy  sorting  of  intervals,  189  pr. 
for  hiring  problem,  123  124 
for  insertion  into  a  binary  search  tree  with 
equal  keys,  303  pr. 

for  MAX  3  CNF  satisfiability,  1123  1124, 
1139 

Miller  Rabin  primality  test,  968  975,  983 
multithreaded,  8 1 1  pr. 
for  partitioning,  179,  185  ex.,  187  188pr. 
for  permuting  an  array,  124  128 
Pollard’s  rho  heuristic,  976  980,  980  ex., 
984 

and  probabilistic  analysis,  123  124 
quicksort,  179  180,  185  ex.,  187  188  pr. 
randomized  rounding,  1139 
for  searching  a  compact  list,  250  pr. 
for  selection,  215  220 
universal  hashing,  265  268 
worst  case  performance  of,  180  ex. 
Randomized  Hire  Assistant,  124 
Randomized  Partition,  179 
Randomized  Quicksort,  179,  303  ex. 
relation  to  randomly  built  binary  search 
trees,  304  pr. 

randomized  rounding,  1139 
Randomized  Select, 216 
Randomize  In  Place,  126 
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randomly  built  binary  search  tree,  299  303, 

304  pr. 

random  number  generator,  117 
random  permutation,  124  128 
uniform,  116,  125 
Random  Sample,  130 ex. 
random  sampling,  129  ex.,  179 
Random  Search,  143  pr. 
random  variable,  1 196  1201 

indicator,  see  indicator  random  variable 
range,  1167 

of  a  matrix,  1228  pr. 
rank 

column,  1223 
full,  1223 

of  a  matrix,  1223,  1226  ex. 
of  a  node  in  a  disjoint  set  forest,  569,  575, 
581  ex. 

of  a  number  in  an  ordered  set,  300,  339 
in  order  statistic  trees,  341  343,  344  345  ex. 
row,  1223 
rate  of  growth,  28 
ray,  1021  ex. 

RB  Delete,  324 
RB  Delete  Fixup,  326 
RB  Enumerate,  348  ex. 

RB  Insert,  315 
RB  Insert  Fixup,  316 
RB  Join,  332 pr. 

RB  Transplant,  323 
reachability  in  a  graph  ("~>),  1 170 
real  numbers  (R),  1158 

reconstructing  an  optimal  solution,  in  dynamic 
programming,  387 
record,  147 
rectangle,  354  ex. 
recurrence,  34,  65  67,83  113 

solution  by  Akra  Bazzi  method,  112  113 
solution  by  master  method,  93  97 
solution  by  recursion  tree  method,  88  93 
solution  by  substitution  method,  83  88 
recurrence  equation,  see  recurrence 
recursion,  30 
recursion  tree,  37,  88  93 

in  proof  of  master  theorem,  98  100 
and  the  substitution  method,  91  92 
Recursive  Activity  Selector,  419 
recursive  case,  65 


Recursive  FFT,911 
Recursive  Matrix  Chain,  385 
red  black  tree,  308  338 
augmentation  of,  346  347 
compared  with  B  trees,  484,  490 
deletion  from,  323  330 
in  determining  whether  any  line  segments 
intersect,  1024 

for  enumerating  keys  in  a  range,  348  ex. 

height  of,  309 

insertion  into,  315  323 

joining  of,  332  pr. 

maximum  key  of,  31 1 

minimum  key  of ,  3 1 1 

predecessor  in,  311 

properties  of,  308  312 

relaxed,  311  ex. 

restructuring,  474  pr. 

rotation  in,  312  314 

searching  in,  3 1 1 

successor  in,  31 1 

see  also  interval  tree,  order  statistic  tree 
Reduce,  807  pr. 

reduced  space  van  Emde  Boas  tree,  557  pr. 

reducibility,  1067  1068 

reduction  algorithm,  1052,  1067 

reduction  function,  1067 

reduction,  of  an  array,  807  pr. 

reflexive  relation,  1163 

reflexivity  of  asymptotic  notation,  5 1 

region,  feasible,  847 

regularity  condition,  95 

rejection 

by  an  algorithm,  1058 
by  a  finite  automaton,  996 
Relabel,  740 
relabeled  vertex,  740 

relabel  operation,  in  push  relabel  algorithms, 
740,  745 

Relabel  To  Front,  755 
relabel  to  front  algorithm,  748  760 
phase  of,  758 
relation,  1163  1166 
relatively  prime,  931 
Relax,  649 
relaxation 

of  an  edge,  648  650 
linear  programming,  1125 
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relaxed  heap,  530 
relaxed  red  black  tree,  3 1 1  ex. 
release  time,  447  pr. 
remainder,  54,  928 
remainder  instruction,  23 
repeated  squaring 

for  all  pairs  shortest  paths,  689  691 
for  raising  a  number  to  a  power,  956 
repeat,  in  pseudocode,  20 
repetition  factor,  of  a  string,  1012pr. 
Repetition  Matcher,  1013pr. 
representative  of  a  set,  561 
Reset,  459  ex. 
residual  capacity,  716,  719 
residual  edge,  716 
residual  network,  715  719 
residue,  54,  928,  982 pr. 
respecting  a  set  of  edges,  626 
return  edge,  779 
return,  in  pseudocode,  22 
return  instruction,  23 
reweighting 

in  all  pairs  shortest  paths,  700  702 
in  single  source  shortest  paths,  679  pr. 
rho  heuristic,  976  980,  980  ex.,  984 
p(n )  approximation  algorithm,  1106,  1123 
Right,  152 
right  child,  1178 
right  conversion,  3 14  ex. 
right  horizontal  ray,  1021  ex. 

Right  Rotate,  313 

right  rotation,  312 

right  spine,  333  pr. 

right  subtree,  1178 

rod  cutting,  360  370,  390  ex. 

root 

of  a  tree,  1176 
of  unity,  906  907 
of  Z*,  955 
rooted  tree,  1176 

representation  of,  246  249 
root  list,  of  a  Fibonacci  heap,  509 
rotation 

cyclic,  1012  ex. 
in  a  red  black  tree,  312  314 
rotational  sweep,  1030  1038 
rounding,  1126 
randomized,  1139 


row  major  order,  394 
row  rank,  1223 
row  vector,  1218 

RSA  public  key  cryptosystem,  958  965,  983 

RS  vEB  tree,  557  pr. 

rule  of  product,  1184 

rule  of  sum,  1183 

running  time,  25 

average  case,  28,  116 

best  case,  29  ex.,  49 

expected,  28,  117 

of  a  graph  algorithm,  588 

and  multithreaded  computation,  779  780 

order  of  growth,  28 

rate  of  growth,  28 

worst  case,  27,  49 

sabermetrics,  412  n. 
safe  edge,  626 
Same  Component,  563 
sample  space,  1189 
sampling,  129  ex.,  179 
SAT,  1079 

satellite  data,  147,  229 
satisfiability,  1072,  1079  1081,  1105, 

1123  1124,  1127  ex.,  1139 
satisfiable  formula,  1049,  1079 
satisfying  assignment,  1072,  1079 
saturated  edge,  739 
saturating  push,  739,  745 
scalar  flow  product,  7 14  ex. 
scalar  multiple,  1220 
scaling 

in  maximum  flow,  762  pr.,  765 
in  single  source  shortest  paths,  679  pr. 
scan,  807  pr. 

Scan,  807  pr. 
scapegoat  tree,  338 
schedule,  444,  1136pr. 
event  point,  1023 

scheduler,  for  multithreaded  computations, 
777,781  783,812 
centralized,  782 
greedy,  782 

work  stealing  algorithm  for,  812 
scheduling,  443  446,  447  pr.,  450,  1 104  pr., 

1 136pr. 

Schur  complement,  820,  834 


Index 


1285 


Schur  complement  lemma,  834 
Scramble  Search,  143  pr. 
seam  carving,  409 pr.,  413 
Search,  230 
searching,  22  ex. 
binary  search,  39  ex.,  799  800 
in  binary  search  trees,  289  291 
in  B  trees,  491  492 
in  chained  hash  tables,  258 
in  compact  lists,  250  pr. 
in  direct  address  tables,  254 
for  an  exact  interval,  354  ex. 
in  interval  trees,  350  353 
linear  search,  22  ex. 
in  linked  lists,  237 

in  open  address  hash  tables,  270  271 
in  proto  van  Emde  Boas  structures,  540  541 
in  red  black  trees,  311 
in  an  unsorted  array,  143  pr. 
in  Van  Emde  Boas  trees,  550 
search  tree,  see  balanced  search  tree,  binary 
search  tree,  B  tree,  exponential  search 
tree,  interval  tree,  optimal  binary  search 
tree,  order  statistic  tree,  red  black  tree, 
splay  tree,  2  3  tree,  2  3  4  tree 
secondary  clustering,  272 
secondary  hash  table,  278 
secondary  storage 

search  tree  for,  484  504 
stacks  on,  502  pr. 

second  best  minimum  spanning  tree,  638  pr. 
secret  key,  959,  962 

segment,  see  directed  segment,  line  segment 
Segments  Intersect,  1018 
Select,  220 
selection,  213 
of  activities,  415  422,  450 
and  comparison  sorts,  222 
in  expected  linear  time,  215  220 
multithreaded,  805  ex. 
in  order  statistic  trees,  340  341 
in  worst  case  linear  time,  220  224 
selection  sort,  29  ex. 
selector  vertex,  1093 
self  loop,  1168 

self  organizing  list,  476  pr.,  478 
semiconnected  graph,  62 1  ex. 
sentinel,  31,  238  240,  309 


sequence  (( )) 
bitonic,  682  pr. 
finite,  1 166 
infinite,  1166 

inversion  in,  41  pr.,  122  ex.,  345  ex. 
probe,  270 

sequential  consistency,  779,  812 
serial  algorithm  versus  parallel  algorithm,  772 
serialization,  of  a  multithreaded  algorithm, 
774,  776 

series,  108  pr.,  1146  1148 
strands  being  logically  in,  778 
set  ({}),  1158  1163 
cardinality  (|  |),  1161 
convex,  7 14  ex. 
difference  (— ),  1159 
independent,  1 101  pr. 
intersection  (fl),  1159 
member  (e),  1 158 
not  a  member  (^),  1 158 
union  (U),  1159 

set  covering  problem,  1117  1122,  1139 
weighted,  1135pr. 
set  partition  problem,  1101  ex. 
shadow  of  a  point,  1038  ex. 
shared  memory,  772 
Shell’s  sort,  42 
shift,  in  string  matching,  985 
shift  instruction,  24 
short  circuiting  operator,  22 
SHORTEST  PATH,  1050 
shortest  paths,  7,  643  707 
all  pairs,  644,  684  707 
Bellman  Ford  algorithm  for,  651  655 
with  bitonic  paths,  682  pr. 
and  breadth  first  search,  597  600,  644 
convergence  property  of,  650,  672  673 
and  difference  constraints,  664  670 
Dijkstra’s  algorithm  for,  658  664 
in  a  directed  acyclic  graph,  655  658 
in  e  dense  graphs,  706  pr. 
estimate  of,  648 

Floyd  Warshall  algorithm  for,  693  697, 

700  ex.,  706 

Gabow’s  scaling  algorithm  for,  679  pr. 
Johnson’s  algorithm  for,  700  706 
as  a  linear  program,  859  860 
and  longest  paths,  1048 
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by  matrix  multiplication,  686  693,  706  707 
and  negative  weight  cycles,  645,  653  654, 

692  ex.,  700  ex. 

with  negative  weight  edges,  645  646 
no  path  property  of,  650,  672 
optimal  substructure  of,  644  645,  687, 

693  694 

path  relaxation  property  of,  650,  673 
predecessor  subgraph  property  of,  650,  676 
problem  variants,  644 
and  relaxation,  648  650 
by  repeated  squaring,  689  691 
single  destination,  644 
single  pair,  381,  644 
single  source,  643  683 
tree  of,  647  648,  673  676 
triangle  inequality  of,  650,  67 1 
in  an  unweighted  graph,  381, 597 
upper  bound  property  of,  650,  671  672 
in  a  weighted  graph,  643 
sibling,  1176 

side  of  a  polygon,  1020  ex. 
signature,  960 
simple  cycle,  1 170 
simple  graph,  1 170 
simple  path,  1170 
longest,  382,  1048 
simple  polygon,  1020  ex. 
simple  stencil  calculation,  809  pr. 
simple  uniform  hashing,  259 
simplex,  848 
Simplex,  871 

simplex  algorithm,  848,  864  879,  896  897 
single  destination  shortest  paths,  644 
single  pair  shortest  path,  381, 644 
as  a  linear  program,  859  860 
single  source  shortest  paths,  643  683 
Bellman  Ford  algorithm  for,  65 1  655 
with  bitonic  paths,  682  pr. 
and  difference  constraints,  664  670 
Dijkstra’s  algorithm  for,  658  664 
in  a  directed  acyclic  graph,  655  658 
in  g  dense  graphs,  706  pr. 

Gabow’s  scaling  algorithm  for,  679  pr. 
as  a  linear  program,  863  ex. 
and  longest  paths,  1048 
singleton,  1161 

singly  connected  graph,  612  ex. 


singly  linked  list,  236 
see  also  linked  list 
singular  matrix,  1223 
singular  value  decomposition,  842 
sink  vertex,  593  ex.,  709,  712 
size 

of  an  algorithm’s  input,  25,  926  927, 

1055  1057 

of  a  binomial  tree,  527  pr. 
of  a  boolean  combinational  circuit,  1072 
of  a  clique,  1086 
of  a  set,  1161 

of  a  subtree  in  a  Fibonacci  heap,  524 
of  a  vertex  cover,  1089,  1 108 
skip  list,  338 
slack,  855 

slack  form,  846,  854  857 
uniqueness  of,  876 
slackness 

complementary,  894  pr. 
parallel,  781 
slack  variable,  855 
slot 

of  a  direct  access  table,  254 
of  a  hash  table,  256 

Slow  All  Pairs  Shortest  Paths,  689 
smoothed  analysis,  897 
★  Socrates,  790 
solution 

to  an  abstract  problem,  1054 
basic,  866 

to  a  computational  problem,  6 
to  a  concrete  problem,  1055 
feasible,  665,  846,  851 
infeasible,  851 
optimal,  851 

to  a  system  of  linear  equations,  814 
sorted  linked  list,  236 
see  also  linked  list 

sorting,  5,  16  20,  30  37,  147  212,  797  805 
bubblesort,  40  pr. 
bucket  sort,  200  204 
columnsort,  208  pr. 
comparison  sort,  191 
counting  sort,  194  197 
fuzzy,  189  pr. 
heapsort,  151  169 
insertion  sort,  12,  16  20 
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k  sorting,  207  pr. 
lexicographic,  304  pr. 
in  linear  time,  194  204,  206  pr. 
lower  bounds  for,  191  194,  21 1,  53 1 
merge  sort,  12,  30  37,  797  805 
by  oblivious  compare  exchange  algorithms, 
208  pr. 

in  place,  17,  148,  206 pr. 
of  points  by  polar  angle,  1020  ex. 
probabilistic  lower  bound  for,  205  pr. 
quicksort,  170  190 
radix  sort,  197  200 
selection  sort,  29  ex. 

Shell’s  sort,  42 
stable,  196 

table  of  running  times,  149 
topological,  8,  612  615,  623 
using  a  binary  search  tree,  299  ex. 
with  variable  length  items,  206  pr. 

0  1  sorting  lemma,  208  pr. 
sorting  network,  811 
source  vertex,  594,  644,  709,  712 
span  law,  780 
spanning  tree,  439,  624 
bottleneck,  640  pr. 
maximum,  1137pr. 
verification  of,  642 
see  also  minimum  spanning  tree 
span,  of  a  multithreaded  computation,  779 
sparse  graph,  589 

all  pairs  shortest  paths  for,  700  705 
and  Prim’s  algorithm,  638  pr. 
sparse  hulled  distribution,  1046  pr. 
spawn,  in  pseudocode,  776  777 
spawn  edge,  778 
speedup,  780 

of  a  randomized  multithreaded  algorithm, 

8 1 1  pr. 
spindle,  485 
spine 

of  a  string  matching  automaton,  997  fig. 
of  a  treap,  333  pr. 
splay  tree,  338,  482 
spline,  840  pr. 
splitting 

of  B  tree  nodes,  493  495 
of  2  3  4  trees,  503  pr. 
splitting  summations,  1 152  1154 


spurious  hit,  991 
square  matrix,  1218 
Square  Matrix  Multiply, 75, 689 
Square  Matrix  Multiply  Recursive, 
77 

square  of  a  directed  graph,  593  ex. 
square  root,  modulo  a  prime,  982  pr. 
squaring,  repeated 
for  all  pairs  shortest  paths,  689  691 
for  raising  a  number  to  a  power,  956 
stability 

numerical,  813,  815,  842 
of  sorting  algorithms,  196,  200  ex. 
stack,  232  233 
in  Graham’s  scan,  1030 
implemented  by  queues,  236  ex. 
linked  list  implementation  of,  240  ex. 
operations  analyzed  by  accounting  method, 
457  458 

operations  analyzed  by  aggregate  analysis, 
452  454 

operations  analyzed  by  potential  method, 
460  461 

for  procedure  execution,  188  pr. 
on  secondary  storage,  502  pr. 

Stack  Empty,  233 
standard  deviation,  1200 
standard  encoding  ({ )),  1057 
standard  form,  846,  850  854 
star  shaped  polygon,  1038  ex. 
start  state,  995 
start  time,  415 

state  of  a  finite  automaton,  995 
static  graph,  562  n. 
static  set  of  keys,  277 
static  threading,  773 
stencil,  809  pr. 
stencil  calculation,  809  pr. 

Stirling’s  approximation,  57 
storage  management,  151,  243  244,  245  ex., 
261  ex. 

store  instruction,  23 
straddle,  1017 
strand,  777 
final,  779 
independent,  789 
initial,  779 

logically  in  parallel,  778 
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logically  in  series,  778 
Strassen’s  algorithm,  79  83,111  112 
multithreaded,  795  796 
streaks,  135  139 
strictly  decreasing,  53 
strictly  increasing,  53 
string,  985,  1 184 
string  matching,  985  1013 

based  on  repetition  factors,  1012pr. 
by  finite  automata,  995  1002 
with  gap  characters,  989  ex.,  1002  ex. 

Knuth  Morris  Pratt  algorithm  for, 

1002  1013 

naive  algorithm  for,  988  990 
Rabin  Karp  algorithm  for,  990  995,  1013 
string  matching  automaton,  996  1002, 

1002  ex. 

strongly  connected  component,  1 170 
decomposition  into,  615  621,623 
Strongly  Connected  Components,  617 
strongly  connected  graph,  1170 
subgraph,  1171 

predecessor,  see  predecessor  subgraph 
subgraph  isomorphism  problem,  1 100  ex. 
subgroup,  943  946 
subpath,  1170 

subproblem  graph,  367  368 
subroutine 
calling,  21,  23,  25  n. 
executing,  25  n. 
subsequence,  391 
subset  (C),  1159,  1161 
hereditary  family  of,  437 
independent  family  of,  437 
SUBSET  SUM,  1097 
subset  sum  problem 

approximation  algorithm  for,  1 128  1134, 
1139 

NP  completeness  of,  1097  1100 
with  unary  target,  1101  ex. 
substitution  method,  83  88 
and  recursion  trees,  91  92 
substring,  1184 
subtract  instruction,  23 
subtraction  of  matrices,  1221 
subtree,  1176 

maintaining  sizes  of,  in  order  statistic  trees, 
343  344 


success,  in  a  Bernoulli  trial,  1201 
successor 

in  binary  search  trees,  291  292 
in  a  bit  vector  with  a  superimposed  binary 
tree,  533 

in  a  bit  vector  with  a  superimposed  tree  of 
constant  height,  535 

finding  i  th,  of  a  node  in  an  order  statistic 
tree,  344  ex. 
in  linked  lists,  236 
in  order  statistic  trees,  347  ex. 
in  proto  van  Emde  Boas  structures,  543  544 
in  red  black  trees,  311 
in  Van  Emde  Boas  trees,  550  55 1 
Successor, 230 
such  that  (:),  1 159 
suffix  (□),  986 
suffix  function,  996 
suffix  function  inequality,  999 
suffix  function  recursion  lemma,  1000 
sum  (^),  1145 
Cartesian,  906  ex. 
infinite,  1145 
of  matrices,  1220 
of  polynomials,  898 
rule  of,  1183 
telescoping,  1 148 
Sum  Arrays,  805  pr. 

Sum  Arrays',  805  pr. 
summary 

in  a  bit  vector  with  a  superimposed  tree  of 
constant  height,  534 
in  proto  van  Emde  Boas  structures,  540 
in  van  Emde  Boas  trees,  546 
summation,  1145  1157 

in  asymptotic  notation,  49  50,  1 146 
bounding,  1149  1156 
formulas  and  properties  of,  1 145  1149 
linearity  of,  1146 
summation  lemma,  908 
supercomputer,  772 
superpolynomial  time,  1048 
supersink,  712 
supersource,  712 
surjection,  1167 
SVD,  842 

sweeping,  1021  1029,  1045  pr. 
rotational,  1030  1038 
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sweep  line,  1022 
sweep  line  status,  1023  1024 
symbol  table,  253,  262,  265 
symmetric  difference,  763  pr. 
symmetric  matrix,  1220,  1222 ex.,  1226ex. 
symmetric  positive  definite  matrix,  832  835, 
842 

symmetric  relation,  1163 
symmetry  of  0  notation,  52 
sync,  in  pseudocode,  776  777 
system  of  difference  constraints,  664  670 
system  of  linear  equations,  806 pr.,  813  827, 
840  pr. 

Table  Delete,  468 
Table  Insert, 464 
tail 

of  a  binomial  distribution,  1208  1215 
of  a  linked  list,  236 
of  a  queue,  234 
tail  recursion,  188  pr.,  419 
Tail  Recursive  Quicksort,  188  pr. 
target,  1097 

Tarjan’s  off  line  least  common  ancestors 
algorithm,  584pr. 
task,  443 

Task  Parallel  Library,  774 

task  scheduling,  443  446,  448  pr.,  450 

tautology,  1066  ex.,  1086  ex. 

Taylor  series,  306  pr. 
telescoping  series,  1148 
telescoping  sum,  1 148 
testing 

of  primality,  965  975,  983 
of  pseudoprimality,  966  968 
text,  in  string  matching,  985 
then  clause,  20  n. 

Theta  notation,  44  47,  64 
thread,  773 

Threading  Building  Blocks,  774 
3  CNF,  1082 
3  CNF  SAT,  1082 

3  CNF  satisfiability,  1082  1085,  1 105 
approximation  algorithm  for,  1123  1124, 
1139 

and  2  CNF  satisfiability,  1049 
3  COLOR,  1 103  pr. 

3  conjunctive  normal  form,  1082 


tight  constraint,  865 
time,  see  running  time 
time  domain,  898 
time  memory  trade  off,  365 
timestamp,  603,  61 1  ex. 

Toeplitz  matrix,  921  pr. 
to,  in  pseudocode,  20 
TOP,  1031 

top  down  method,  for  dynamic  programming, 
365 

top  of  a  stack,  232 
topological  sort,  8,  612  615,  623 

in  computing  single  source  shortest  paths  in 
a  dag,  655 

Topological  Sort,  613 
total  order,  1165 
total  path  length,  304  pr. 
total  preorder,  1165 
total  relation,  1165 
tour 

bitonic,  405  pr. 

Euler,  623  pr.,  1048 
of  a  graph,  1096 
track,  486 
tractability,  1048 
trailing  pointer,  295 

transition  function,  995,  1001  1002,  1012ex. 
transitive  closure,  697  699 

and  boolean  matrix  multiplication,  832  ex. 
of  dynamic  graphs,  705  pr.,  707 
Transitive  Closure,  698 
transitive  relation,  1163 
transitivity  of  asymptotic  notation,  5 1 
Transplant,  296,  323 
transpose 

conjugate,  832  ex. 

of  a  directed  graph,  592  ex. 

of  a  matrix,  1217 

of  a  matrix,  multithreaded,  792  ex. 
transpose  symmetry  of  asymptotic  notation,  52 
traveling  salesman  problem 

approximation  algorithm  for,  1111  1117, 
1139 

bitonic  euclidean,  405  pr. 
bottleneck,  1 1 17  ex. 

NP  completeness  of,  1096  1097 
with  the  triangle  inequality,  1112  1115 
without  the  triangle  inequality,  1115  1116 
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traversal  of  a  tree,  287,  293  ex.,  342,  1114 
treap,  333  pr.,  338 
Treap  Insert,  333  pr. 
tree,  1173  1180 
AA  trees,  338 
AVL,  333  pr.,  337 
binary,  see  binary  tree 
binomial,  527  pr. 
bisection  of,  1181  pr. 
breadth  first,  594,  600 
B  trees,  484  504 
decision,  192  193 
depth  first,  603 
diameter  of,  602  ex. 
dynamic,  482 
free,  1 172  1 176 
full  walk  of,  1114 
fusion,  212,  483 
heap, 151  169 
height  balanced,  333  pr. 
height  of,  1177 
interval,  348  354 
k  neighbor,  338 

minimum  spanning,  see  minimum  spanning 
tree 

optimal  binary  search,  397  404,  413 
order  statistic,  339  345 
parse,  1082 
recursion,  37,  88  93 
red  black,  see  red  black  tree 
rooted,  246  249,  1 176 
scapegoat,  338 
search,  see  search  tree 
shortest  paths,  647  648,  673  676 
spanning,  see  minimum  spanning  tree, 
spanning  tree 
splay,  338,  482 
treap,  333  pr.,  338 
2  3,  337,  504 
2  3  4, 489,  503  pr. 
van  Emde  Boas,  531  560 
walk  of,  287,  293  ex.,  342,  11 14 
weight  balanced  trees,  338 
Tree  Delete,  298,  299  ex.,  323  324 
tree  edge,  601,  603,  609 
Tree  Insert,294,  315 
Tree  Maximum,  291 
Tree  Minimum,  291 


Tree  Predecessor,  292 
Tree  Search,  290 
Tree  Successor,  292 
tree  walk,  287,  293  ex.,  342,  1114 
trial,  Bernoulli,  1201 
trial  division,  966 
triangle  inequality,  1112 
for  shortest  paths,  650,  671 
triangular  matrix,  1219,  1222  ex.,  1225  ex. 
trichotomy,  interval,  348 
trichotomy  property  of  real  numbers,  52 
tridiagonal  linear  systems,  840  pr. 
tridiagonal  matrix,  1219 
trie  (radix  tree),  304  pr. 

y  fast,  558  pr. 

Trim,  1130 

trimming  a  list,  1130 

trivial  divisor,  928 

truth  assignment,  1072,  1079 

truth  table,  1070 

TSP,  1096 

tuple,  1162 

twiddle  factor,  912 

2  CNF  SAT,  1086  ex. 

2  CNF  satisfiability,  1086  ex. 

and  3  CNF  satisfiability,  1049 
two  pass  method,  571 
2  3  4  heap,  529  pr. 

2  3  4  tree,  489 
joining,  503  pr. 
splitting,  503  pr. 

2  3  tree,  337,  504 

unary,  1056 

unbounded  linear  program,  851 
unconditional  branch  instruction,  23 
uncountable  set,  1161 

underdetermined  system  of  linear  equations, 
814 

underflow 

of  a  queue,  234 
of  a  stack,  233 
undirected  graph,  1168 

articulation  point  of,  62 1  pr. 

biconnected  component  of,  621  pr. 

bridge  of,  62 1  pr. 

clique  in,  1086 

coloring  of,  1 103  pr.,  1180  pr. 
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computing  a  minimum  spanning  tree  in, 
624  642 

converting  to,  from  a  multigraph,  593  ex. 

d  regular,  136  ex. 

grid,  760  pr. 

hamiltonian,  1061 

independent  set  of,  1101  pr. 

matching  of,  732 

nonhamiltonian,  1061 

vertex  cover  of,  1089,  1108 

see  also  graph 

undirected  version  of  a  directed  graph,  1172 
uniform  hashing,  271 

uniform  probability  distribution,  1191  1192 
uniform  random  permutation,  116,  125 
union 

of  dynamic  sets,  see  uniting 
of  languages,  1058 
of  sets  (U),  1159 
Union,  505, 562 

disjoint  set  forest  implementation  of,  57 1 
linked  list  implementation  of,  565  567, 
568  ex. 

union  by  rank,  569 

unique  factorization  of  integers,  93 1 

unit  (1),  928 

uniting 

of  Fibonacci  heaps,  511  512 
of  heaps,  506 
of  linked  lists,  241  ex. 
of  2  3  4  heaps,  529  pr. 
unit  lower  triangular  matrix,  1219 
unit  time  task,  443 
unit  upper  triangular  matrix,  1219 
unit  vector,  1218 

universal  collection  of  hash  functions,  265 
universal  hashing,  265  268 
universal  sink,  593  ex. 
universe,  1160 

of  keys  in  van  Emde  Boas  trees,  532 
universe  size,  532 
unmatched  vertex,  732 
unsorted  linked  list,  236 
see  also  linked  list 
until,  in  pseudocode,  20 
unweighted  longest  simple  paths,  382 
unweighted  shortest  paths,  381 
upper  bound,  47 


upper  bound  property,  650,  671  672 
upper  median,  213 
upper  square  root  (t/  ),  546 
upper  triangular  matrix,  1219,  1225  ex. 

valid  shift,  985 
value 

of  a  flow,  710 
of  a  function,  1166 
objective,  847,  85 1 

value  over  replacement  player,  41 1  pr. 
Vandermonde  matrix,  902,  1226  pr. 
van  Emde  Boas  tree,  531  560 
cluster  in,  546 

compared  with  proto  van  Emde  Boas 
structures,  547 
deletion  from,  554  556 
insertion  into,  552  554 
maximum  in,  550 
membership  in,  550 
minimum  in,  550 
predecessor  in,  55 1  552 
with  reduced  space,  557  pr. 
successor  in,  550  551 
summary  in,  546 
Var  [  ]  (variance),  1199 
variable 
basic,  855 
entering,  867 
leaving,  867 
nonbasic,  855 
in  pseudocode,  21 
random,  1196  1201 
slack,  855 

see  also  indicator  random  variable 
variable  length  code,  429 
variance,  1199 

of  a  binomial  distribution,  1205 
of  a  geometric  distribution,  1203 
vEB  Empty  Tree  Insert,  553 
vEB  tree,  see  van  Emde  Boas  tree 
vEB  Tree  Delete,  554 
vEB  Tree  Insert,  553 
vEB  Tree  Maximum,  550 
vEB  Tree  Member,  550 
vEB  Tree  Minimum,  550 
vEB  Tree  Predecessor,  552 
vEB  Tree  Successor, 551 
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vector,  1218,  1222  1224 
convolution  of,  901 
cross  product  of,  1016 
orthonormal,  842 
in  the  plane,  1015 
Venn  diagram,  1160 
verification,  1061  1066 
of  spanning  trees,  642 
verification  algorithm,  1063 
vertex 

articulation  point,  621  pr. 
attributes  of,  592 
capacity  of,  7 14  ex. 
in  a  graph,  1168 
intermediate,  693 
isolated,  1169 
overflowing,  736 
of  a  polygon,  1020  ex. 
relabeled,  740 
selector,  1093 

vertex  cover,  1089,  1108,  1124  1127,  1139 
VERTEX  COVER,  1090 
vertex  cover  problem 

approximation  algorithm  for,  1 108  1111, 
1139 

NP  completeness  of,  1089  1091,1105 
vertex  set,  1168 

violation,  of  an  equality  constraint,  865 
virtual  memory,  24 
Viterbi  algorithm,  408  pr. 

VORP,  41 1  pr. 

walk  of  a  tree,  287,  293  ex.,  342,  1114 
weak  duality,  880  881,  886ex.,  895  pr. 
weight 

of  a  cut,  1127  ex. 
of  an  edge,  591 
mean,  680  pr. 
of  a  path,  643 

weight  balanced  tree,  338,  473  pr. 
weighted  bipartite  matching,  530 
weighted  matroid,  439  442 
weighted  median,  225  pr. 
weighted  set  covering  problem,  1135  pr. 
weighted  union  heuristic,  566 
weighted  vertex  cover,  1124  1127,  1139 
weight  function 
for  a  graph,  591 


in  a  weighted  matroid,  439 
while,  in  pseudocode,  20 
white  path  theorem,  608 
white  vertex,  594,  603 
widget,  1092 
wire,  1071 
Witness,  969 

witness,  to  the  compositeness  of  a  number,  968 
work  law,  780 

work,  of  a  multithreaded  computation,  779 
work  stealing  scheduling  algorithm,  812 
worst  case  running  time,  27,  49 

Yen’s  improvement  to  the  Bellman  Ford 
algorithm,  678  pr. 
y  fast  trie,  558  pr. 

Young  tableau,  167  pr. 

Z  (set  of  integers),  1158 
7Ln  (equivalence  classes  modulo  n),  928 
Z*  (elements  of  multiplicative  group 
modulo  n),  941 

Z^  (nonzero  elements  of  Z„),  967 
zero  matrix,  1218 

zero  of  a  polynomial  modulo  a  prime,  950  ex. 

0  1  integer  programming,  1100  ex.,  1125 
0  1  knapsack  problem,  425,  427  ex.,  1137  pr., 
1139 

0  1  sorting  lemma,  208  pr. 
zonk,  1195  ex. 


